Measure-Transformed Quasi Maximum Likelihood Estimation

Size: px
Start display at page:

Download "Measure-Transformed Quasi Maximum Likelihood Estimation"

Transcription

1 Measure-Transformed Quasi Maximum Likelihood Estimation 1 Koby Todros and Alfred O. Hero Abstract In this paper, we consider the problem of estimating a deterministic vector parameter when the likelihood function is unknown or not expressible. We develop an estimator, called measure-transformed quasi maximum likelihood estimator (MT-QMLE), that minimizes the empirical Kullback-Leibler divergence between the transformed probability measure of the data and a hypothesized Gaussian probability distribution. By judicious choice of the transform we show that the proposed estimator can gain sensitivity to higher-order statistical information and resilience to outliers. Under some regularity conditions we show that the MT-QMLE is consistent, asymptotically normal and unbiased. Furthermore, we derive a necessary and sufficient condition for its asymptotic efficiency. The MT-QMLE requires complete knowledge, and in some cases only partial knowledge, of the mean and covariance functions that characterize the hypothesized Gaussian probability measure. When these quantities are completely unknown a plug-in version of the MT-QMLE is developed that is based on their empirical estimates. The MT-QMLE and its plug-in version are applied to source localization problems in simulation examples that illustrate their sensitivity to higher-order statistical information and resilience to outliers. Index Terms Higher-order statistics, parameter estimation, probability measure transform, robust estimation, source localization. I. INTRODUCTION Classical multivariate estimation [1], [2] deals with the problem of estimating a deterministic vector parameter using a sequence of multivariate samples from an underlying probability distribution. When the probability distribution is known to lie within a specified parametric family of probability measures, parameter estimation techniques such as the method of maximum likelihood [3] can be implemented that utilize complete statistical information. In many practical scenarios this knowledge is unavailable, and therefore, one must resort to other methods that require partial statistical information.

2 2 One of the most popular techniques of this kind is the method of moments [3], [4] that is based on fitting multivariate cumulants of certain orders to their empirical estimates. Usually, first and secondorder cumulants, i.e., the mean vector and covariance matrix, are used. Their popularity arises from the fact that they are easy to manipulate, their sample estimates have simple implementations, and the performance analysis of the resulting estimators is often traceable. In some circumstances, first and second-order cumulants may be non-informative, such as for certain types of non-gaussian data. In order to overcome this limitation higher-order cumulants may be incorporated. However, unlike first and secondorder cumulants, higher-order cumulants involve complicated tensor analysis [5], and their empirical estimates are highly non-robust to outliers and have increased computational and sample complexity. Based on the observation that the mean vector and the covariance matrix are the gradient vector and Hessian matrix of the cumulant generating function evaluated at the origin, another framework that does not require knowledge of the likelihood function has been proposed in [6]. This method operates by fitting the gradient or Hessian of the cumulant generating function evaluated at some off-origin points to their empirical estimates. These points are called processing points and can be judiciously chosen to improve estimation performance. Similarly to the mean vector and the covariance matrix, the resultant cumulantrelated quantities have appealing decomposability properties and their sample estimates have simple implementations. Additionally, they have the key advantage that they involve higher-order statistical information. This approach has been successfully applied to signal gain estimation [6] and to autoregression parameter estimation [7], where a data-driven procedure for optimal selection of the processing points has been devised that minimizes an empirical estimate of the asymptotic mean-squared-error (MSE). Other applications can be found in [8]-[13]. Recently, we have shown that the off-origin gradient vector and Hessian matrix of the cumulant generating function may be viewed as the standard mean vector and covariance matrix under some transformed probability measure [14]-[16]. Hence, a wider class of estimators than the one proposed in [6] can be obtained by considering other measure-transformed mean vectors and covariance matrices. These estimators can gain sensitivity to higher-order statistical information, resilience to outliers, and yet have the computational advantages of first and second-order methods of moments. In this paper, we use this measure transformation approach to develop a new estimator that operates by jointly fitting a measure-transformed mean vector and covariance matrix to their empirical estimates. The proposed transform is structured by a non-negative function, called the MT-function, and maps the probability distribution into a set of new probability measures on the observation space. By modifying the MT-function, classes of measure transformations can be obtained that have different useful properties.

3 3 Under the proposed transform we define the measure-transformed (MT) mean vector and covariance matrix, derive their strongly consistent estimates, and study their sensitivity to higher-order statistical information and resilience to outliers. Similarly to the quasi maximum likelihood estimator (QMLE) [17], [18], the proposed estimator, called MT-QMLE minimizes the empirical Kullback-Leibler divergence [19] between the transformed probability measure and a hypothesized probability distribution. In this paper, the hypothesized distribution is a Gaussian probability measure with the MT-mean vector and MT-covariance matrix parameterized by the unknown parameter to be estimated. Under some regularity conditions we show that the MT-QMLE is consistent, asymptotically normal and unbiased. Additionally, a necessary and sufficient condition for asymptotic efficiency is derived. Furthermore, a data-driven procedure for optimal selection of the MTfunction within some parametric class of functions is developed that minimizes an empirical estimate of the asymptotic MSE. The MT-QMLE requires complete knowledge, and in some cases only partial knowledge, of the MTmean and MT-covariance functions that characterize the hypothesized Gaussian probability measure. When these quantities are completely unknown a plug-in version of the MT-QMLE is developed that is based on their empirical estimates. We illustrate the MT-QMLE for the problem of angle of arrival (AOA) based source localization [20] in heavy-tailed compound Gaussian noise [21] that produces outliers. By specifying the MT-function within the family of zero-centered Gaussian functions parameterized by a scale parameter, we show that unlike the sample covariance matrix (SCM) based estimator [22] the MT-QMLE is resilient to outliers. Moreover, we show that the MT-QMLE performs similarly to the maximum-likelihood estimator (MLE) that, unlike the MT-QMLE, requires complete knowledge of the likelihood function. The plug-in MT- QMLE is illustrated for the problem of received signal strength (RSS) based source localization [23] in heavy-tailed compound Gaussian noise. By specifying the MT-function within the family Gaussian functions parameterized by location and scale parameters, we show that unlike the sample mean based estimator, which is the MLE under the assumption of isotropic Gaussian noise, the MT-QMLE is resilient to outliers and performs similarly to the standard MLE. The paper is organized as follows. In Section II, we define the MT-mean and MT-covariance and derive their empirical estimates. In Section III, we use these quantities to construct the MT-QMLE. The plug-in MT-QMLE is developed in Section IV. The proposed estimators are applied to source localization problems in Section V. In Section VI, the main points of this contribution are summarized. The proofs of the propositions and theorems stated throughout the paper are given in the Appendix.

4 4 II. MEASURE-TRANSFORMED MEAN AND COVARIANCE In this section, we develop a transform on the probability measure of a random vector whose probability distribution is parametrized by the vector parameter to be estimated. Under the proposed transform, we define the measure-transformed mean vector and covariance matrix, derive their strongly consistent estimates, and establish their sensitivity to higher-order statistical information and resilience to outliers. These quantities will be used in the following section to construct the measure-transformed quasi maximum likelihood estimator. A. Probability measure transform We define the measure space (, S,P ; ), where C p is the observation space of a random vector, S is a -algebra over, P ; is a probability measure that belongs to some unknown parametric family of probability measures {P ;# : # 2 } on S, 2 is the unknown vector parameter to be estimated and R m denotes the parameter space. Let g :!C denote an integrable scalar function on. The expectation of g () under P ; is defined as Z E[g ();P ; ], g (x) dp ; (x), (1) where x 2. The empirical probability measure ˆP from P ; is specified by ˆP (A) = 1 N where A 2S, and n ( ) is the Dirac probability measure at n [24]. N n=1 Definition 1. Given a non-negative function u : C p! R + satisfying a transform on P ; is defined via the relation: where A 2S and given a sequence of samples n, n =1,...,N n (A), (2) 0 < E[u ();P ; ] < 1, (3) Q ; (A), T u [P ; ](A) = ' u (x; ), The function u ( ) is called the MT-function. Z A ' u (x; ) dp ; (x), (4) u (x) E[u ();P ; ]. (5) Proposition 1 (Properties of the transform). Let Q ; be defined by relation (4). Then

5 5 1) Q ; is a probability measure on S. 2) Q ; is absolutely continuous w.r.t. P ;, with Radon-Nikodym derivative [24]: [A proof is given in Appendix A] dq ; (x) dp ; (x) = ' u (x; ). (6) The probability measure Q ; is said to be generated by the MT-function u ( ). By modifying u ( ), such that the condition (3) is satisfied, virtually any probability measure on S can be obtained. B. The MT-mean and MT-covariance According to (6) the mean vector and covariance matrix of under Q ; are given by: µ ( ), E[' u (; );P ; ] (7) and ( ), E µ ( ) µ ( ) H 'u (; );P ;, (8) respectively. Equations (7) and (8) imply that µ ( ) and ( ) are weighted mean and covariance of under P ;, with the weighting function ' u ( ; ) defined in (5). Hence, for any non-constant analytic MT-function u ( ), which has a convergent Taylor series expansion, the MT-mean and MT-covariance involve higher-order statistical moments of P ;. By modifying the MT-function u ( ), such that the condition (3) is satisfied, the MT-mean and MT-covariance under Q ; are modified. In particular, by choosing u ( ) to be any non-zero constant valued function we have Q ; = P ;, for which the standard mean vector µ ( ) and covariance matrix ( ) are obtained. C. The empirical MT-mean and MT-covariance According to (7) and (8) the MT-mean and MT-covariance can be estimated using only samples from the distribution P ;. In the following Proposition, strongly consistent estimates of µ ( ) are constructed based on N i.i.d. samples of. Proposition 2 (Strongly consistent estimates of the MT-mean and MT-covariance). Let n, n =1,...,N denote a sequence of i.i.d. samples from P ;. Define the empirical mean and covariance estimates: ˆµ, N n ˆ' u ( n ) (9) n=1

6 6 and respectively, where If ˆ, N n=1 n ˆ' u ( n ), ˆµ n ˆµ H ˆ'u ( n ), () u ( n ) P N n=1 u ( n). (11) h i E kk 2 2 u ();P ; < 1 then ˆµ w.p. 1! µ ( ) and ˆ w.p. 1! ( ) as N!1, where w.p.! 1 denotes convergence with probability (w.p.) 1 [25]. [The proof is very similar to the proof of Proposition 2 in [26] and therefore omitted] Note that for u() 1 the estimators ˆµ N and N 1 ˆ reduce to the standard unbiased sample mean vector and sample covariance matrix (SCM), respectively. Also notice that ˆµ written as statistical functionals [ ] and (2), i.e., and and ˆ can be [ ] of the empirical probability measure ˆP defined in ˆµ = E[u (); ˆP ] E[u (); ˆP, [ ˆP ] (12) ] ˆ E[( [ = ˆP ])( [ ˆP ]) H u (); ˆP ] E[u (); ˆP, ] [ ˆP ]. (13) By (5), (7), (8), (12) and (13), when ˆP is replaced by P ; we have [P ; ] = µ ( ) and [P ]= ( ), which implies that ˆµ and ˆ ; are Fisher consistent [27]. D. Robustness to outliers Here, we study the robustness of the empirical MT-mean and MT-covariance to outliers using their vector and matrix valued influence functions [28]. Define the probability measure P, (1 )P ; + y, where 0 1, y 2 C p, and y is the Dirac probability measure at y. The influence function of a Fisher consistent estimator with statistical functional H[ ] at probability distribution P ; [28]: is defined as H [P ] H [P ; ] IF H (y; P ; ), lim [P ]. =0 The influence function describes the effect on the estimator of an infinitesimal contamination at the point y. An estimator is said to be B-robust if its influence function is bounded [28]. Using (12)-(14) one can

7 7 verify that the influence functions of ˆµ and IF x IF x and (y; P ; )= u (y)[(y ˆ are given by µ ( )) (y; P ; )= u (y)(y E[u ();P ; ] µ ( ))(y µ ( )) H E[u(); P ; ] (15) ( )], (16) respectively. The following proposition states a sufficient condition for boundedness of (15) and (16). This condition is satisfied for the Gaussian MT-functions proposed in Section V. Proposition 3. The influence function (15) and (16) are bounded if the MT-function u(y) and the product u(y)kyk 2 2 are bounded over Cp. [The proof is similar to the proof of Proposition 3 in [26] and therefore omitted] III. THE MEASURE-TRANSFORMED QUASI MAIMUM LIKELIHOOD ESTIMATOR In this section we develop an estimator for that minimizes the empirical Kulback-Leibler divergence between the transformed probability measure Q ; and a complex circular Gaussian probability distribution [29] ;#, # 2 with MT-mean µ (#) and MT-covariance (#). Regularity conditions for consistency, asymptotic normality and unbiasedness are derived. Additionally, we provide a closedform expression for the asymptotic MSE and obtain a necessary and sufficient condition for asymptotic efficiency. Optimal selection of the MT-function out of some parametric class of functions is also discussed. A. The MT-QMLE The Kullback-Leibler divergence (KLD) between Q ; D KL h Q ; and ;# is defined as [19] : i ;#, E log q (; ) (; #) ; Q ;, (17) where q (x; ) and (x; #), det 1 h (#) i exp x µ H (#) 1 (#) x µ (#) (18) are the density functions of Q ; and ;#, respectively, w.r.t. a dominating -finite measure on S. h i According to (6), D Q ; ;# can be estimated using only samples from P ;. Therefore, similarly to (9) and (), an empirical estimate of (17) given a sequence of samples n, n =1,...,N from P ; is defined as: ˆD KL hq ; i ;#, N n=1 ˆ' u ( n ) log q ( n ; ) ( n ; #), (19)

8 8 where ˆ' u ( ) is defined in (11). The proposed estimator of is obtained by minimization of (19) w.r.t. #, which by (9), () and (18) amounts to maximization of the following objective function: h i J u (#), D ˆ LD (#) ˆµ µ (#) 2, (20) ( x (#)) 1 where D LD [A B], tr AB 1 log det AB 1 p is the log-determinant divergence [30] between positive definite matrices A, B 2 C p p, and kak C, p a H Ca denotes the weighted `2-norm of a vector a 2 C p with positive-definite weighting matrix C 2 C p p. Hence, the proposed MT-QMLE is given by: ˆ u = arg max J u (#). (21) B. Asymptotic performance analysis Here, we study the asymptotic performance of the estimator (21). For simplicity, we assume a sequence of i.i.d. samples n, n =1,...,N from P ;. Theorem 1 (Strong consistency of ˆ u ). Assume that the following conditions are satisfied: (A-1) The parameter space is compact. (A-2) µ ( ) 6= µ (A-3) (#) is non-singular 8# 2. (#) or ( ) 6= (#) 8 6= #. (A-4) µ (#) and (#) are continuous over. h i (A-5) E kk 2 2 u ();P ; < 1. Then, ˆ u w.p. 1! as N!1. (22) [A proof is given in Appendix B] Theorem 2 (Asymptotic normality and unbiasedness of ˆ u ). Assume that the following conditions are satisfied: (B-1) ˆ u P! as N!1, where P! denotes convergence in probability [25]. (B-2) lies in the interior of which is assumed to be compact. (B-3) µ (#) and (#) are twice continuously differentiable in. (B-4) E u 2 h i ();P ; < 1 and E kk 4 2 u2 ();P ; < 1. Then, ˆ u D! N (0, C u ( )) as N!1, (23)

9 9 where D! denotes convergence in distribution [25]. The asymptotic MSE takes the form: C u ( ) =N 1 F 1 u ( ) G u ( ) F 1 u ( ), (24) where G u (#), E u 2 () u (; #) T u (; #);P ;, (25) u (; #), r # log (; #), (26) F u (#), E[u () u (; #);P ; ], (27) u (; #), r 2 # log (; #) (28) and it is assumed that F u ( ) is non-singular. [A proof is given in Appendix C] The following proposition relates the asymptotic MSE (24) of ˆ u and the Cramér-Rao lower bound (CRLB) [31], [32]. Proposition 4 (Relation to the CRLB). Let f (x; #), dp ;# (x)/d (x) (29) denote the density function of P ;# w.r.t. a -finite dominating measure on S, and let Assume that the following conditions are satisfied: (; #), r # log f(; #). (30) (C-1) For any # 2, the k k@# j k, j =1,...,p exist a.e., where # k, k =1,...,p denote the entries of the vector #. (C-2) For any # 2, v k (x; log (x; k f (x; #) u (x) 2L 1 (,), k =1,...,p, where L 1 (,) denotes the space of absolutely integrable functions, defined on, w.r.t. the measure. (C-3) There exist dominating functions r k,j (x) 2L 1 (,), k, j =1,...,p, such that r k,j (x) a.e. (C-4) The matrix G u ( ) (25) and the Fisher information matrix are non-singular. I FIM ( ), E (; ) T (; );P j Then, C u ( ) N 1 I 1 FIM ( ),

10 where equality holds if and only if [A proof is given in Appendix D] (; ) =I FIM ( ) F 1 u ( ) u (; ) u () w.p. 1. (31) One can verify that when P ; is a Gaussian measure, the condition (31) is satisfied only for non-zero constant valued MT-functions for which Q ; = P ; and the resulting MT-mean and MT-covariance (7), (8) only involve first and second-order moments. This implies that in the Gaussian case, non-constant MTfunctions will always lead to asymptotic performance degradation. In the non-gaussian case, however, as will be shown in the sequel, there are cases where the MT-function should deviate from a constant value in order to decrease the asymptotic MSE. This deviation results in weighted mean and covariance that involve higher-order moments. By solving the differential equation (31) one can obtain the MT-function for which the resulting MT-QMLE is asymptotically efficient. Unfortunately, in the non-gaussian case, the solution is highly cumbersome and requires the knowledge of the likelihood function f (; ) which is practically unavailable under the considered circumstances. Therefore, as described in Subsection III-C, we resort to another technique for optimal choice of the MT-function that is based on the following empirical estimate of the asymptotic MSE. Theorem 3 (Empirical asymptotic MSE). Define the empirical asymptotic MSE: where and Ĉ u (ˆ u ), N 1 ˆF 1 u (ˆ u )Ĝu(ˆ u )ˆF 1 u (ˆ u ), (32) Ĝ u (#), N 1 N n=1 u 2 ( n ) u ( n ; #) T u ( n ; #) (33) ˆF N u (#), N 1 u ( n ) u ( n ; #). (34) n=1 Furthermore, assume that the following conditions are satisfied: (D-1) ˆ u P! as N!1. (D-2) µ (#) and (#) are twice continuously differentiable in which is assumed to be compact. (D-3) E u 2 h i ();P ; < 1 and E kk 4 2 u2 ();P ; < 1. Then, Ĉ u (ˆ u ) P! C u ( ) as N!1. (35) [A proof is given in Appendix E]

11 11 C. Optimal choice of the MT-function We propose to specify the MT-function within some parametric family {u (;!),! 2 C r } that satisfies the conditions stated in Definition 1 and Theorem 3. For example, in order to gain robustness against outliers, the Gaussian family of functions that satisfy the conditions in Proposition 3 is a logical choice. An optimal choice of the MT-function parameter! would be this that minimizes the empirical asymptotic MSE (32) that is constructed by the same sequence of samples used for obtaining the MT- QMLE (21). IV. THE PLUG-IN MEASURE-TRANSFORMED QUASI MAIMUM LIKELIHOOD ESTIMATOR The MT-QMLE (21) requires the knowledge of the MT-mean and MT-covariance functions µ (#) and (#). When these quantities are unknown and only their empirical estimates µ (#) and (#) based on M samples from P ;# are available for any # 2, we proposed the following plug-in version of the MT-QMLE: u = arg max J u (#), (36) where J u (#), h i D ˆ LD (#) ˆµ µ (#) 2. (37) ( x (#)) 1 The following theorem states an asymptotic relation between the MT-QMLE and its plug-in version. Theorem 4. (Relation to the MT-QMLE) Assume that the estimators µ (#) and consistent, i.e., (#) are uniformly sup µ (#) µ (#) P! 0 and sup (#) (#) P! 0, as M!1. (38) Furthermore, assume that J u (#) (20) and J u (#) (37) have unique global maxima w.p. 1. Then, u P! ˆ u as M!1. (39) [A proof is given in Appendix F] Therefore, when the conditions stated in Theorem 4 are satisfied and M is sufficiently large, the plugin MT-QMLE (36) attains the MSE performance of the MT-QMLE (21). Hence, when both M and the number of samples N from P ; are sufficiently large, an empirical estimate of the asymptotic MSE of u can be obtained using (32) after replacing µ µ (#) and (#) in (18) with µ (#) and (#) and (#) with their sample estimates (replace (#) and recalculate the terms comprising (32)). Similarly to the MT-QMLE, this empirical estimate can be used for optimal choice of the MT-function.

12 12 A. Robust angle of arival based source localization V. APPLICATIONS We illustrate the use of the proposed MT-QMLE (21) for robust AOA based source localization in heavy-tailed compound Gaussian noise that produces outliers. We consider a uniform linear array (ULA) of p sensors with half wave-length spacing that receive a signal generated by a narrowband far-field point source with azimuthal AOA. Under this model the array output satisfies [20]: n = S n a ( )+W n, n =1,...,N, (40) where n is a discrete time index, n 2 C p is the vector of received signals, S n 2 C is the emitted signal, a ( ), [1, exp ( i sin ( )),...,exp ( i (r 1) sin ( ))] T is the steering vector of the array toward direction and W n 2 C p is an additive noise. We assume that the following conditions are satisfied: 1) 2 =[ /2, /2 ] where is a small positive constant, 2) the emitted signal is symmetrically distributed about the origin, 3) S n and W n are statistically independent and first-order stationary, and 4) the noise component is compound Gaussian with stochastic representation [21]: W n = n Z n, (41) where n 2 R ++ is a first-order stationary process, called the texture component, and Z n 2 C p is a proper-complex wide-sense stationary Gaussian process with zero-mean and scaled unit covariance 2 ZI. The processes n and Z n are also assumed to be statistically independent. In order to gain robustness against outliers, as well as sensitivity to higher-order moments, we specify the MT-function in the zero-centred Gaussian family of functions parametrized by a width parameter!, i.e., u G (x,!)=(! 2 ) p exp kxk 2 /! 2,! 2 R ++. (42) One can verify that the MT-function (42) satisfies the B-robustness conditions stated in Proposition 3. Using (7), (8) and (40)-(42) it can be shown that the MT-mean and MT-covariance under Q (u G) ;# by: are given µ (ug) (#;!) =0 (43) and (ug) (#;!) =r S (!) a (#) a H (#)+r W (!) I, (44) respectively, where r S (!) and r W (!) are some strictly positive functions of!. Hence, by substituting (42)-(44) into (20) the resulting MT-QMLE (21) is obtained by solving the following maximization

13 13 problem: where Ĉ(uG) (!), ˆ (ug) ˆ ug (!) = arg max ah (#) Ĉ(uG) (!) a (#), (!)+ˆµ (ug) (!) ˆµ (ug)h (!). Under the considered settings, it can be shown that the conditions stated in Theorems 1-3 are satisfied. The resulting asymptotic MSE (24) is given by: h E 4 4 Z + 2 Z 2!2 p 2 S 2 h p 2pS, p i 2 C ug ( ;!) = 2 Z 2 Z,! ; P S, 6 h +!2 E 2 p S 2 h p i ps, Z,! ; P 2 cos 2 ( )(p 2 1) N, (45) S, where h (S,,!), ( 2 +! 2 )/! 2 p 2 exp S 2 /( 2 +! 2 ). (46) Furthermore, its empirical estimate (32) takes the form: P N n=1 Ĉ ug (ˆ ug (!);!) = 2 ( n, ˆ ug (!))u 2 G ( n,!) ( P N n=1 ( n, ˆ ug (!))u G ( n,!)), (47) 2 where (,#), 2Re ȧ H (#) H a (#), ȧ (#), da (#)/d#, ä (#), d 2 a (#)/d 2 # and (,#), 2Re ä H (#) H a (#) + ȧ H (#) 2. In the following simulation examples, we consider a BPSK signal impinging on a 4-element ULA with half wavelength spacing from DOA = 30. We consider an -contaminated Gaussian noise model [21] under which the texture component of the compound Gaussian noise (41) is a binary random variable with =1w.p. 1 and = c w.p.. The parameters and c that control the heaviness of the noise tails were set to 0.2 and 0, respectively. We define the signal-to-noise-ratio (SNR) as SNR, log 2 S / 2 Z (1 )+ c 2. In the first simulation example, we compared the asymptotic MSE (45) to its empirical estimate (47) as a function of! obtained from a single realization of N = 00 i.i.d. snapshots with SNR = 25 [db]. Observing Fig. 1, one sees that due to the consistency of (47) the compared quantities are close. In the second simulation example, we compared the empirical, asymptotic (45) and empirical asymptotic (47) MSEs of the MT-QMLE to the empirical MSEs obtained by the SCM-based estimator [22] and the MLE. The optimal Gaussian MT-function parameter! opt was obtained by minimizing (47) over =[1, 30]. Here, N = 00 snapshots were used with averaging over 00 Monte-Carlo simulations. The SNR is used to index the performances as depicted in Fig. 2. One sees that the MT-QMLE outperforms the non-robust SCM-based estimator and performs similarly to the MLE that, unlike the MT-QMLE, requires complete knowledge of the likelihood function. Additionally, one can notice that the empirical, asymptotic and empirical asymptotic MSEs of the MT-QMLE are nearly identical.

14 14 In order to evaluate the performance of the proposed estimator under the nominal Gaussian noise case, we set the contamination parameter to zero and repeated the simulation examples described above under the same settings with the exception that different SNR levels, ranging from 0 to 20 [db], were considered. Observing Fig. 3, one sees that, again, the asymptotic MSE and its consistent empirical estimate behave similarly as the width parameter of the Gaussian MT-function! varies. Furthermore, observing Fig. 4, one can notice that the proposed MT-QMLE performs similarly to the standard SCM based estimator when the noise is normally distributed. Asymptotic RMSE Empirical asymptotic RMSE 0 RMSE [deg] ω Fig. 1. AOA based source localization in -contaminated Gaussian noise: Asymptotic RMSE (45) and its empirical estimate (47) versus the width parameter! of the Gaussian MT-function (42). The true angle of arrival, number of snapshots and signalto-noise ratio were set to =30, N =00and SNR = 25 [db], respectively. Notice that due to the consistency of (47) the compared quantities are close. B. Robust received signal strength based source localization Here, we illustrate the use of the plug-in MT-QMLE (36) for robust received signal strength (RSS) based source localization in heavy-tailed non-gaussian noise. For simplicity, we consider an array of p anchor sensors located on a circle with radius r a [meters] that receive the power of a source located on a circle with known radius r s [meters]. Under the log-distance path loss model [23], the array output satisfies: n = a ( )+W n, n =1,...,N, (48)

15 15 2 RMSE [deg] Empirical RMSE: MT QMLE Asymptotic RMSE: MT QMLE Empirical asymptotic RMSE: MT QMLE Empirical RMSE: SCM Empirical RMSE: MLE SNR [db] Fig. 2. AOA based source localization in -contaminated Gaussian noise: The empirical, asymptotic (45) and empirical asymptotic (47) RMSEs of the MT-QMLE versus SNR as compared to the empirical RMSEs of the SCM-based estimator and the MLE. The true angle of arrival and number of snapshots were set to =30 and N =00, respectively. Notice that the MT-QMLE outperforms the non-robust SCM-based estimator and performs similarly to the MLE that, unlike the MT-QMLE, requires complete statistical information. 1 Asymptotic RMSE Empirical asymptotic RMSE RMSE [deg] ω Fig. 3. AOA based source localization in Gaussian noise: Asymptotic RMSE (45) and its empirical estimate (47) versus the width parameter! of the Gaussian MT-function (42). The true angle of arrival, number of snapshots and signal-to-noise ratio were set to =30, N =00and SNR = 0 [db], respectively. Notice that due to the consistency of (47) the compared quantities are close.

16 16 0 Empirical RMSE: MT QMLE Asymptotic RMSE: MT QMLE Empirical asymptotic RMSE: MT QMLE Empirical RMSE: SCM Empirical RMSE: MLE RMSE [deg] SNR [db] Fig. 4. AOA based source localization in Gaussian noise: The empirical, asymptotic (45) and empirical asymptotic (47) RMSEs of the MT-QMLE versus SNR as compared to the empirical RMSEs of the SCM-based estimator and the MLE. The true angle of arrival and number of snapshots were set to =30 and N =00, respectively. Notice that the MT-QMLE performs similarly to the SCM-based estimator when the noise is normally distributed. where n is a discrete time index, n 2 R p is a vector of received signals, [a ( )] k, P 0 log (d k ( )/d 0 ), k =1,...,p, (49) P 0 is the reference power in [db] at reference distance d 0 [meters], is the path loss exponent, is the distance in [meters] between the source and the k-th anchor, d k ( ), kz s ( ) z a ( k )k 2 (50) z s ( ), r s [cos ( ), sin ( )] T (51) is the source coordinates vector, 2 =[0, 2 ] is the unknown angle to be estimated, is a small positive constant, z a ( k ), r a [cos ( k ), sin ( k )] T (52) is the vector of coordinates of the k-th anchor sensor, and W n 2 R p is an unobserved additive i.i.d. noise process with unknown distribution. In this case, the MT-mean µ (#) and MT-covariance (#) are unknown, and therefore, the MT-QMLE (21) cannot be implemented. We assume a training sequence W 0 m, m =1,...,M of i.i.d.

17 17 samples from the unknown noise distribution P W that is independent of the observation sequence n, n =1,...,N. Using the training sequence we obtain empirical estimates of µ (#) and (#) for any value of # 2 and implement the plug version of the MT-QMLE (36). By (7), (8) and (48) it follows that the MT-mean and MT-covariance of under Q ;# are given by µ (#) =a (#)+µ (v) W (#) (53) and (#) = (v) W (#), (54) respectively, where the parametric MT-function v ( ; ) is defined as v (W; #), u (a (#)+W). (55) Based on relations (53) and (54) we define the following empirical estimates of µ (#) and (#): µ (#), a (#)+ µ (v) W (#) (56) and (#), (v) W (#), (57) where µ (v) W (#) and (v) W (#) are obtained using (9) and () with the training noise sequence W 0 m, m =1,...,M and the parametric MT-function (55). The plug-in MT-QMLE is obtained by substituting (56) and (57) into (37) followed by the maximization step (36). In order to test the RSS-based localization performance under heavy-tailed non-gaussian noise that produces outliers, we assume a real compound Gaussian noise with stochastic representation [21] W n = n Z n, (58) where n 2 R ++ is a first-order stationary process, called the texture component, and Z n 2 R p is a real wide-sense stationary Gaussian process, that is independent of n, with zero-mean and scaled unit covariance 2 ZI. Furthermore, in order to gain robustness against outliers and sensitivity to higher-order statistical information we consider the following Gaussian MT-function that satisfies the B-robustness conditions stated in Proposition 3. u G (x;!) =(2 2 ) p/2 exp where!, [t, ] T, t is a location parameter and is a scale parameter. kx a (t)k 2 /2 2, (59) Under the choice of the Gaussian MT-function (59), it can be shown that the empirical estimates of µ (u G) (#;!) and (u G) (#;!), defined in (56) and (57), are uniformly consistent. Therefore, by Theorem

18 18 4 the plug-in MT-QMLE attains the asymptotic MSE (24) of the MT-QMLE when the noise training sequence length, M, is sufficiently large. A closed form expression of the resulting asymptotic MSE is given in Appendix G. Moreover, an empirical estimate of the asymptotic MSE is obtained using (32) after replacing µ (u G) (#;!) and (u G) (#;!) with their sample estimates (56), (57). A closed form expression of the empirical asymptotic MSE of the plug-in MT-QMLE is given in Appendix H. In the following simulation examples, we consider p =4sensors located on a circle with radius r a =5 [meters] that receive the power of a source located on a wider circle with radius r s = 50 [meters]. The angle was set to 180, the reference power and distance were set to P 0 = [db] and d 0 =1[meter], and the path loss exponent was set to =5. We consider an -contaminated Gaussian noise model [21] under which the texture component of the compound Gaussian noise (58) is a binary random variable with =1w.p. 1 and = c w.p.. The parameters and c that control the heaviness of the noise tails were set to 0.2 and 50, respectively. The length of the noise training sequence was set to M = 000. In the first simulation example, we compared the asymptotic MSE (86) to its empirical estimate (8) as a function of the location and width parameters t and of the Gaussian MT-function (59). The empirical estimate was obtained from a single realization of N = 00 i.i.d. snapshots with noise variance 2 W = [db]. Observing Fig. 5, one sees that the compared quantities are close. In the second simulation example, we compared the empirical, asymptotic (86) and empirical asymptotic (8) MSEs of the plug-in MT-QMLE to the empirical MSEs obtained by the MLE and the standard sample mean based estimator, which is the MLE under the assumption of spatially white Gaussian noise. The optimal Gaussian MT-function parameters t opt and opt were obtained by minimizing the empirical asymptotic MSE (8) over =[0, 2 ] [2, 15]. Here, N = 00 snapshots were used with averaging over 00 Monte-Carlo simulations. The noise variance 2 W is used to index the performances as depicted in Fig. 6. One sees that the plug-in MT-QMLE outperforms the non-robust sample-mean based estimator and performs similarly to the MLE that, unlike the plug-in MT-QMLE, requires complete knowledge of the likelihood function. Additionally, one can notice that the empirical, asymptotic and empirical asymptotic MSEs of the plug-in MT-QMLE are nearly identical. In order to evaluate the performance of the proposed estimator under the nominal Gaussian noise case, we set the contamination parameter to zero and repeated the simulation examples described above under the same settings with the exception that different SNR levels, ranging from 0 to 20 [db], were considered. Observing Fig. 7, one sees that, again, the asymptotic MSE and its consistent empirical estimate behave similarly as the location and width parameters of the Gaussian MT-function varies. Furthermore, observing Fig. 8, one can notice that the proposed MT-QMLE performs similarly to the

19 Root EAMSE [deg] Root AMSE [deg] t [deg] 5 t [deg] τ τ (a) (b) Fig. 5. RSS based source localization in -contaminated Gaussian noise: (a) The asymptotic RMSE (86) of the plug-in MT-QMLE and (b) its empirical estimate (8) versus the the location and width parameters (t, ) of the Gaussian MT-function (59). The true angle, the number of observations, the noise training sequence length and the noise variance were set to = 180, N = 00, M = 000, and 2 W = [db], respectively. Notice that due to the consistency of (8) the compared quantities are close. standard SCM based estimator when the noise is normally distributed. VI. C ONCLUSION In this paper a new multivariate estimator, called MT-QMLE, was derived by applying a transform to the probability distribution of the data. A plug-in version of the MT-QMLE was also developed that is based on empirical estimates of the MT-mean and MT-covariance functions that characterize the hypothesized Gaussian probability measure. By specifying the MT-function in the Gaussian family, the proposed MTQMLE and its plug-in version were applied to robust AOA and RSS based source localization problems. Exploration of other MT-functions may result in additional estimators in this class that have different useful properties. A PPENDI A. Proof of Proposition 1: Property 1: Since 'u (x, ) is nonnegative, then by Corollary in [25] Q is a measure on S. Furthermore, Q; ( ) = 1 so that Q; is a probability measure on S.

20 20 2 Empirical RMSE: plug in MT QMLE Asymptotic RMSE: plug in MT QMLE Empirical asymptotic RMSE: plug in MT QMLE Empirical RMSE: sample mean Empirical RMSE: MLE 1 RMSE [deg] σ2w [db] Fig. 6. RSS based source localization in -contaminated Gaussian noise: The empirical, asymptotic (86) and empirical asymptotic (8) RMSEs of the plug-in MT-QMLE versus the noise variance 2 W as compared to the empirical RMSEs of the sample-mean based estimator and the MLE. The true angle, the number of observations and the noise training sequence length were set to = 180, N = 00 and M = 000, respectively. Notice that the plug-in MT-QMLE outperforms the non-robust sample-mean based estimator and performs similarly to the MLE that, unlike the plug-in MT-QMLE, requires complete statistical Root EAMSE [deg] Root AMSE [deg] information τ (a) t [deg] t [deg] τ (b) Fig. 7. RSS based source localization in Gaussian noise: (a) The asymptotic RMSE (86) of the plug-in MT-QMLE and (b) its empirical estimate (8) versus the the location and width parameters (t, ) of the Gaussian MT-function (59). The true angle, the number of observations, the noise training sequence length and the noise variance were set to = 180, N = 00, M = 000, and 2 W = 5 [db], respectively. Notice that due to the consistency of (8) the compared quantities are close.

21 21 1 RMSE [deg] 0 1 Empirical RMSE: plug in MT QMLE Asymptotic RMSE: plug in MT QMLE Empirical asymptotic RMSE: plug in MT QMLE Empirical RMSE: sample mean Empirical RMSE: MLE σ 2 [db] W Fig. 8. RSS based source localization in Gaussian noise: The empirical, asymptotic (86) and empirical asymptotic (8) RMSEs of the plug-in MT-QMLE versus the noise variance 2 W as compared to the empirical RMSEs of the sample-mean based estimator and the MLE. The true angle, the number of observations and the noise training sequence length were set to =180, N =00and M =000, respectively. Notice that the plug-in MT-QMLE performs similarly sample mean based estimator when the noise is normally distributed. Property 2: Follows from definitions and in [25]. B. Proof of Theorem 1: Define the deterministic quantity i J u (#), D LD h ( ) (#) µ ( ) µ (#) 2 ( x (#)) 1. (60) According to Lemmas 1 and 2, stated below, Ju (#) is uniquely maximized at # = and the random objective function J u (#) defined in (20) converges uniformly w.p. 1 to J u (#) as N!1. Furthermore, by assumptions A-3 and A-4 J u (#) is continuous over the compact parameter space w.p. 1. Therefore, by Theorem 3.4 in [33] we conclude that (22) holds. Lemma 1. Let Ju (#) be defined by relation (60). If conditions A-1-A-4 are satisfied, then J u (#) is uniquely maximized at # =.

22 22 Proof. By (60) and assumptions A-3 and A-4 J u (#) is continuous over the compact set. Therefore, by the extreme-value theorem [36] it must have a global maximum in. We now show that the global maximum is uniquely attained at # =. According to (60) i J u ( ) Ju (#) =D LD h ( ) (#) + µ ( ) µ (#) 2 ( x (#)) 1. (61) Notice that the log-determinant divergence between positive-definite matrices A, B satisfies D LD [A B] 0 with equality if and only if A = B [30]. Also notice that the weighted `2-norm of a vector a with positive-definite weighting matrix C satisfies kak C 0 with equality if and only if a = 0. Therefore, by assumptions A-2 and A-3 we conclude that the difference in (61) is strictly positive for all # 6=, which implies that Ju (#) is uniquely maximized at # =. Lemma 2. Let J u (#) and J u (#) be defined by relations (20) and (60), respectively. If conditions A-1 and A-3-A-5 are satisfied, then sup J u (#) Ju (#) w.p. 1! 0 as N!1. (62) Proof. Using (20), (60), the Cauchy-Schwartz inequality and its the matrix version [34] one can verify that sup J u (#) Ju (#) sup ( ) ˆ + µ Fro 1 (#) + 2 µ ( ) ˆµ Fro sup ( ) µ H ( ) ˆµ + log det ˆ 1 ( ) ˆµ H 1 (#) µ (#) w.p. 1. Notice that by assumptions A-3 and A-4 and the compactness of, the terms sup 1 (#) µ sup µ ( ) and ˆ continuous mappings of ˆµ Fro k( (#)) 1 k Fro and (#) are finite. Also notice that by assumption A-5 and Proposition 2 ˆµ w.p. 1! w.p. 1! ( ) as N!1, and that the operators k k Fro, k k, log det [ ] define real and ˆ. Hence, by the Mann-Wald Theorem [35] the upper bound in (63) converges to zero w.p. 1 as N!1, and therefore, the relation (62) must hold. C. Proof of Theorem 2: By assumptions B-1 and B-2 the estimator ˆ u is weakly consistent and the true parameter vector lies in the interior of, which is assumed to be compact. Therefore, we conclude that ˆ u lies in an open

23 23 neighbourhood U of with sufficiently high probability as N gets large. Hence, ˆ u is a local maximum point of the objective function J u (#) (20) whose gradient satisfies r # J u ˆ u N u ( n )= n=1 N u ( n ) u n ; ˆ u = 0. (63) n=1 By (18) and assumption B-3, the vector function u (; #) defined in (26) is continuous over w.p. 1. Using (63) and the mean-value theorem [37] applied to each entry of u (; #) we conclude that 0 = 1 p N N n=1 u ( n ) u n, ˆ u = 1 p N N n=1 u ( n ) u ( n, ) ˆFu ( ) p N ˆ u, (64) where ˆF u (#) defined in (34) is an unbiased estimator of the symmetric matrix function F u (#) (27), u (; #) is defined in (28) and lies on the line segment connecting ˆ u and. Since ˆ u is weakly consistent must be weakly consistent as well. Therefore, by Lemma 3, stated below, ˆF u ( ) P! F u ( ) as N!1, where F u (#) is non-singular at # = by assumption. Hence, by the Mann-Wald theorem [35] ˆF 1 u ( ) P! F 1 u ( ) as N!1, (65) which implies that ˆF u ( ) is invertible with sufficiently high probability as N gets large. Therefore, by (64) the equality p N ˆ u = ˆF u 1 ( 1 ) p N N n=1 u ( n ) u ( n, ) (66) holds with sufficiently high probability as N gets large. Furthermore, by Lemma 5 stated below 1 p N N n=1 u ( n ) u ( n, ) D! N (0, G u ( )) as N!1, (67) where G u (#) is defined in (25). Thus, by (65)-(67) and Slutsky s theorem [25] the relation (23) holds. Lemma 3. Let F u (#) and ˆF u (#) be the matrix functions defined in (27) and (34), respectively. Furthermore, let denote an estimator of. If P! as N!1and conditions B-3 and B-4 are satisfied, then ˆF u ( ) P! F u ( ) as N!1. Proof. By the triangle inequality kˆf u ( ) F u ( ) kkˆf u ( ) F u ( ) k + kf u ( ) F u ( )k. (68) Therefore, it suffices to prove that kˆf u ( ) F u ( ) k! P 0, as N!1 (69)

24 24 and kf u ( ) F u ( )k P! 0 as N!1. (70) By Lemma 4, stated below, F u (#) is continuous in. Therefore, since P! as N!1, by the Mann-Wald Theorem [35] we conclude that (70) holds. We further conclude by Lemma 4 that kˆf u ( ) F u ( ) ksup ˆF u (#) F u (#) P! 0, as N!1. Lemma 4. Let F u (#) and ˆF u (#) be the matrix functions defined in (27) and (34), respectively. If conditions B-3 and B-4 are satisfied, then F u (#) is continuous in and sup as N!1. ˆF u (#) F u (#) Proof. Notice that the samples n, n =1,...,N are i.i.d., the parameter space is compact and by (18) and assumption B-3 the matrix function u (; #) defined in (28) is continuous in w.p. 1. Furthermore, using (18), (28), the triangle inequality, the Cauchy-Schwartz inequality, assumption B-3, and the compactness of it can be shown that there exists a positive scalar C such that [ u (; #)] k,j 1+kk 2 C w.p. 1. 8k, j =1,...,p. Therefore, by (27), (34), assumption B-4 and the uniform weak law of large numbers [39] we conclude that F u (#) is continuous in and sup as N!1. ˆF u (#) F u (#) P! 0 P! 0 Lemma 5. Given a sequence n, n =1,...,N of i.i.d. samples from P ; 1 p N N n=1 u ( n ) u ( n, ) D! N (0, G u ( )) as N!1, where G u ( ) and u (, ) are defined in (25) and (26), respectively. Proof. Since n, n =1,...,N are i.i.d. random vectors and the functions u ( ) and u (, ) are real the products u ( n ) u ( n, ) n =1,...,N are real and i.i.d. Furthermore, by Lemmas 6 and 7, stated below, the definition of G u ( ) in (25) and the compactness of we have that E[u () u (, );P ; ]= 0 and cov [u () u (, );P ; ] = G u ( ) is finite. Therefore by the central limit theorem [38] 1 NP we conclude that p N u ( n ) u ( n, ) is asymptotically normal with zero mean and covariance G u ( ). n=1 Lemma 6. The random vector function u (, #) defined in (26) satisfies E[u () u (, #);P ;# ]=0. (71)

25 25 Proof. Notice that E [u () u (, #);P ;# ]=E [ u (, #) ' u (; #);P ;# ] E [u ();P ;# ], (72) where ' u ( ; ) is defined in (5) and the expectation E [u ();P ;# ] is finite by assumption (3). Also notice that by (18) and (26) the k-th entry of the vector function u (, #) is given by: " # [ u (, #)] log (; #) = tr (#) k ( ) + 2Re µ H (#) (#) + tr k " (#) (#) 1 k Therefore, by (7), (8), (72) and (73) the relation (71) is easily verified. µ (#) (73) # µ H (#). Lemma 7. Let G u (#) and Ĝu (#) be the matrix functions defined in (25) and (33), respectively. If conditions B-3 and B-4 are satisfied, then G u (#) is continuous in and sup as N!1. Ĝ u (#) G u (#) Proof. Notice that the samples n, n =1,...,N are i.i.d., the parameter space is compact and by (18) and assumption B-3 the vector function u (; #) defined in (26) is continuous in w.p. 1. Using (18), (26), the triangle inequality, the Cauchy-Schwartz inequality, assumption B-3, and the compactness of it can be shown that there exists a positive scalar C such that T u (; #) u (; #) 1+kk k,j 2 + kk kk3 2 + kk4 2 C 8k, j =1,...,p. (74) Therefore, by (25), (33), assumption B-4 and the uniform weak law of large numbers [39] we conclude that G u (#) is continuous in and sup D. Proof of Proposition 4: Ĝ u (#) G u (#) P! 0 as N!1. Define (; ), u() u(; ). Using the definitions of C u ( ), G u ( ) and F u ( ) in (24), (25) and (27), respectively, identity 1 stated below, the symmetricity of F u ( ), the non-singularity of G u ( ) and the covariance semi-inequality [3] one can verify that C 1 u ( ) = NE[(; ) T (; );P ; ]E 1 (; ) T (; );P ; E[ (; ) T (; ); P ; ] P! 0 NE[(; ) T (; ); P ; ], NI FIM ( ). (75)

26 26 where equality holds if and only if the relation (31) holds. The proof is complete by taking the inverse of (75). Identity 1. F u ( ) =E[u() u(; ) T (; ); P ; ] Proof. According to Lemma 6, stated in Appendix C, E[u () u (, #);P ;# ]=0. Therefore, by (1) and (29) 0 log T (x; #) u (x) f (x; #) (76) = Z u log (x; T f (x; #) d (x) + log T (x; log f (x; #) u (x) f (x; #) The second equality in (76) stems from assumptions C-1-C-3 and Theorem 2.40 in [40] according to which integration and differentiation can be interchanged. Therefore, by (1), (27), (29), (30) and (76) E[u() u(; #) T (; #); P ;# ]= E[u() u (; #); P ;# ], F u ( ). (77) E. Proof of Theorem 3: Under assumption D-1, ˆ u P! as N!1. Therefore, by Lemma 3, stated in Appendix C, and Lemma 8, stated below, that require conditions D-2 and D-3 to be satisfied, we have that ˆF u (ˆ u )! P F u ( ) as N!1and Ĝu(ˆ u )! P G u ( ) as N!1. Hence, by the Mann-Wald theorem [35] we conclude that (35) is satisfied. Lemma 8. Let G u (#) and Ĝu (#) be the matrix functions defined in (25) and (33), respectively. Furthermore, let denote an estimator of. If P! as N!1and conditions D-2 and D-3 are satisfied, then Ĝu ( )! P G u ( ) as N!1. Proof. The proof is similar to the one of Lemma 3 stated in Appendix C. Simply replace F u (#) and ˆF u (#) with G u (#) and Ĝu (#) and use Lemma 7 instead of Lemma 4. F. Proof of Theorem 4: First, we show that the objective function J u (#) (37) converges uniformly in probability to the objective function J u (#) (20), i.e., we show that sup J u (#) Ju (#) P! 0 as M!1. Using (), (20), (37),

27 27 the triangle inequality, the Cauchy-Schwartz inequality, and its matrix extension [34] one can verify that sup where Ĉ J u (#) Ju (#) Ĉ, ˆ + ˆµ ˆµ H, B 1 (#), Fro sup B 1 (#)+sup 1 (#) h i B 2 (#), log det (#) B 3 (#), µ (#) 2 ( B 4 (#), Notice that for any k =1,...,4 Pr sup B k (#) > Pr sup + Pr sup + Pr sup Pr sup B k (#) >,sup B k (#) >,sup B k (#) >,sup B 2 (#) +sup B 3 (#) + ˆµ 1 (#) h log det (#) sup B 4 (#), (78), (79) Fro i, (80) µ x (#)) 1 (#) 2, ( x (#)) 1 (81) 1 (#) µ (#). (82) 1 (#) µ (#) µ (#) µ (#) > 1 (#) (#) > 2 µ (#) µ (#) 1, sup µ (#) µ (#) > 1 +Pr + Pr sup B k (#) >,sup sup µ (#) µ (#) 1, sup where, 1 and 2 are positive constants. By (38), Pr Pr sup Pr (#) (#) 2 (#) (#) > 2 sup (#) (#) 2, (83) µ (#) µ (#) > 1! 0 and (#) (#) > 2! 0 as M!1. Moreover, by (79)-(82) we have that sup B k (#) >,sup µ (#) µ (#) 1, sup as 1, 2! 0. Hence, by (83) we conclude that sup B k (#) therefore, by (78) sup J u (#) Ju (#) (#) (#) 2! 0 P! 0 as M!18k =1,...,4, and P! 0 as M!1. (84)

28 28 We now use this result to prove the statement of the theorem. Notice that for any >0 and >0 h i Pr u ˆ u > = Pr u ˆ u >,sup J u (#) Ju (#) > (85) + Pr Pr sup + Pr By (84) Pr sup and J u (#) have unique global maxima w.p. 1, u ˆ u >,sup J u (#) Ju (#) > u ˆ u >,sup J u (#) Ju (#) J u (#) Ju (#). J u (#) Ju (#) >! 0 as M!1. Furthermore, by (21) and (36) when J u (#) Pr u ˆ u >,sup Therefore, by (85) we conclude that u P! ˆ u as M!1. J u (#) Ju (#)! 0 as! 0. G. Asymptotic MSE of the plug-in MT-MLE in the problem of RSS based source localization by: According to (24), (53)-(55), (58) and (59), the asymptotic MSE of the plug-in MT-QMLE is given C ug ( ;!) = 1 N G ( ;!) F 2 ( ;!), (86) where G ( ;!) = B ( ;!)E[h ( ;,!);P ] and F ( ;!) = A ( ;!)E[g ( ;,!);P ]. The scalar functions A (#;!) and B (#;!) are defined here as A (#;!), 1 2 tr H 2 (#,!) + µ (u G) (#;!) 2 (u G ) 1 (87) x (#;!) and B (#;!), 0.25B 1 (#,!) 0.5B 2 (#,!)+0.25B 3 (#,!)+B 4 (#,!) B 5 (#,!), (88)

MEASURE TRANSFORMED QUASI LIKELIHOOD RATIO TEST

MEASURE TRANSFORMED QUASI LIKELIHOOD RATIO TEST MEASURE TRANSFORMED QUASI LIKELIHOOD RATIO TEST Koby Todros Ben-Gurion University of the Negev Alfred O. Hero University of Michigan ABSTRACT In this paper, a generalization of the Gaussian quasi likelihood

More information

Plug-in Measure-Transformed Quasi Likelihood Ratio Test for Random Signal Detection

Plug-in Measure-Transformed Quasi Likelihood Ratio Test for Random Signal Detection Plug-in Measure-Transformed Quasi Likelihood Ratio Test for Random Signal Detection Nir Halay and Koby Todros Dept. of ECE, Ben-Gurion University of the Negev, Beer-Sheva, Israel February 13, 2017 1 /

More information

Econometrics I, Estimation

Econometrics I, Estimation Econometrics I, Estimation Department of Economics Stanford University September, 2008 Part I Parameter, Estimator, Estimate A parametric is a feature of the population. An estimator is a function of the

More information

2 Statistical Estimation: Basic Concepts

2 Statistical Estimation: Basic Concepts Technion Israel Institute of Technology, Department of Electrical Engineering Estimation and Identification in Dynamical Systems (048825) Lecture Notes, Fall 2009, Prof. N. Shimkin 2 Statistical Estimation:

More information

PCA with random noise. Van Ha Vu. Department of Mathematics Yale University

PCA with random noise. Van Ha Vu. Department of Mathematics Yale University PCA with random noise Van Ha Vu Department of Mathematics Yale University An important problem that appears in various areas of applied mathematics (in particular statistics, computer science and numerical

More information

Lecture 7 Introduction to Statistical Decision Theory

Lecture 7 Introduction to Statistical Decision Theory Lecture 7 Introduction to Statistical Decision Theory I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 20, 2016 1 / 55 I-Hsiang Wang IT Lecture 7

More information

Chapter 3. Point Estimation. 3.1 Introduction

Chapter 3. Point Estimation. 3.1 Introduction Chapter 3 Point Estimation Let (Ω, A, P θ ), P θ P = {P θ θ Θ}be probability space, X 1, X 2,..., X n : (Ω, A) (IR k, B k ) random variables (X, B X ) sample space γ : Θ IR k measurable function, i.e.

More information

Optimal Time Division Multiplexing Schemes for DOA Estimation of a Moving Target Using a Colocated MIMO Radar

Optimal Time Division Multiplexing Schemes for DOA Estimation of a Moving Target Using a Colocated MIMO Radar Optimal Division Multiplexing Schemes for DOA Estimation of a Moving Target Using a Colocated MIMO Radar Kilian Rambach, Markus Vogel and Bin Yang Institute of Signal Processing and System Theory University

More information

ML ESTIMATION AND CRB FOR NARROWBAND AR SIGNALS ON A SENSOR ARRAY

ML ESTIMATION AND CRB FOR NARROWBAND AR SIGNALS ON A SENSOR ARRAY 2014 IEEE International Conference on Acoustic, Speech and Signal Processing ICASSP ML ESTIMATION AND CRB FOR NARROWBAND AR SIGNALS ON A SENSOR ARRAY Langford B White School of Electrical and Electronic

More information

Graduate Econometrics I: Maximum Likelihood I

Graduate Econometrics I: Maximum Likelihood I Graduate Econometrics I: Maximum Likelihood I Yves Dominicy Université libre de Bruxelles Solvay Brussels School of Economics and Management ECARES Yves Dominicy Graduate Econometrics I: Maximum Likelihood

More information

The properties of L p -GMM estimators

The properties of L p -GMM estimators The properties of L p -GMM estimators Robert de Jong and Chirok Han Michigan State University February 2000 Abstract This paper considers Generalized Method of Moment-type estimators for which a criterion

More information

EXTENDED GLRT DETECTORS OF CORRELATION AND SPHERICITY: THE UNDERSAMPLED REGIME. Xavier Mestre 1, Pascal Vallet 2

EXTENDED GLRT DETECTORS OF CORRELATION AND SPHERICITY: THE UNDERSAMPLED REGIME. Xavier Mestre 1, Pascal Vallet 2 EXTENDED GLRT DETECTORS OF CORRELATION AND SPHERICITY: THE UNDERSAMPLED REGIME Xavier Mestre, Pascal Vallet 2 Centre Tecnològic de Telecomunicacions de Catalunya, Castelldefels, Barcelona (Spain) 2 Institut

More information

A NEW BAYESIAN LOWER BOUND ON THE MEAN SQUARE ERROR OF ESTIMATORS. Koby Todros and Joseph Tabrikian

A NEW BAYESIAN LOWER BOUND ON THE MEAN SQUARE ERROR OF ESTIMATORS. Koby Todros and Joseph Tabrikian 16th European Signal Processing Conference EUSIPCO 008) Lausanne Switzerland August 5-9 008 copyright by EURASIP A NEW BAYESIAN LOWER BOUND ON THE MEAN SQUARE ERROR OF ESTIMATORS Koby Todros and Joseph

More information

Chapter 1: A Brief Review of Maximum Likelihood, GMM, and Numerical Tools. Joan Llull. Microeconometrics IDEA PhD Program

Chapter 1: A Brief Review of Maximum Likelihood, GMM, and Numerical Tools. Joan Llull. Microeconometrics IDEA PhD Program Chapter 1: A Brief Review of Maximum Likelihood, GMM, and Numerical Tools Joan Llull Microeconometrics IDEA PhD Program Maximum Likelihood Chapter 1. A Brief Review of Maximum Likelihood, GMM, and Numerical

More information

Distributed Estimation, Information Loss and Exponential Families. Qiang Liu Department of Computer Science Dartmouth College

Distributed Estimation, Information Loss and Exponential Families. Qiang Liu Department of Computer Science Dartmouth College Distributed Estimation, Information Loss and Exponential Families Qiang Liu Department of Computer Science Dartmouth College Statistical Learning / Estimation Learning generative models from data Topic

More information

Robust covariance matrices estimation and applications in signal processing

Robust covariance matrices estimation and applications in signal processing Robust covariance matrices estimation and applications in signal processing F. Pascal SONDRA/Supelec GDR ISIS Journée Estimation et traitement statistique en grande dimension May 16 th, 2013 FP (SONDRA/Supelec)

More information

COMPLEX CONSTRAINED CRB AND ITS APPLICATION TO SEMI-BLIND MIMO AND OFDM CHANNEL ESTIMATION. Aditya K. Jagannatham and Bhaskar D.

COMPLEX CONSTRAINED CRB AND ITS APPLICATION TO SEMI-BLIND MIMO AND OFDM CHANNEL ESTIMATION. Aditya K. Jagannatham and Bhaskar D. COMPLEX CONSTRAINED CRB AND ITS APPLICATION TO SEMI-BLIND MIMO AND OFDM CHANNEL ESTIMATION Aditya K Jagannatham and Bhaskar D Rao University of California, SanDiego 9500 Gilman Drive, La Jolla, CA 92093-0407

More information

EUSIPCO

EUSIPCO EUSIPCO 03 56974375 ON THE RESOLUTION PROBABILITY OF CONDITIONAL AND UNCONDITIONAL MAXIMUM LIKELIHOOD DOA ESTIMATION Xavier Mestre, Pascal Vallet, Philippe Loubaton 3, Centre Tecnològic de Telecomunicacions

More information

ECE531 Lecture 10b: Maximum Likelihood Estimation

ECE531 Lecture 10b: Maximum Likelihood Estimation ECE531 Lecture 10b: Maximum Likelihood Estimation D. Richard Brown III Worcester Polytechnic Institute 05-Apr-2011 Worcester Polytechnic Institute D. Richard Brown III 05-Apr-2011 1 / 23 Introduction So

More information

Brief Review on Estimation Theory

Brief Review on Estimation Theory Brief Review on Estimation Theory K. Abed-Meraim ENST PARIS, Signal and Image Processing Dept. abed@tsi.enst.fr This presentation is essentially based on the course BASTA by E. Moulines Brief review on

More information

Performance Analysis of Coarray-Based MUSIC and the Cramér-Rao Bound

Performance Analysis of Coarray-Based MUSIC and the Cramér-Rao Bound Performance Analysis of Coarray-Based MUSIC and the Cramér-Rao Bound Mianzhi Wang, Zhen Zhang, and Arye Nehorai Preston M. Green Department of Electrical & Systems Engineering Washington University in

More information

A Few Notes on Fisher Information (WIP)

A Few Notes on Fisher Information (WIP) A Few Notes on Fisher Information (WIP) David Meyer dmm@{-4-5.net,uoregon.edu} Last update: April 30, 208 Definitions There are so many interesting things about Fisher Information and its theoretical properties

More information

Classical Estimation Topics

Classical Estimation Topics Classical Estimation Topics Namrata Vaswani, Iowa State University February 25, 2014 This note fills in the gaps in the notes already provided (l0.pdf, l1.pdf, l2.pdf, l3.pdf, LeastSquares.pdf). 1 Min

More information

If we want to analyze experimental or simulated data we might encounter the following tasks:

If we want to analyze experimental or simulated data we might encounter the following tasks: Chapter 1 Introduction If we want to analyze experimental or simulated data we might encounter the following tasks: Characterization of the source of the signal and diagnosis Studying dependencies Prediction

More information

A Primer on Asymptotics

A Primer on Asymptotics A Primer on Asymptotics Eric Zivot Department of Economics University of Washington September 30, 2003 Revised: October 7, 2009 Introduction The two main concepts in asymptotic theory covered in these

More information

Machine Learning Basics: Maximum Likelihood Estimation

Machine Learning Basics: Maximum Likelihood Estimation Machine Learning Basics: Maximum Likelihood Estimation Sargur N. srihari@cedar.buffalo.edu This is part of lecture slides on Deep Learning: http://www.cedar.buffalo.edu/~srihari/cse676 1 Topics 1. Learning

More information

ECE 275A Homework 6 Solutions

ECE 275A Homework 6 Solutions ECE 275A Homework 6 Solutions. The notation used in the solutions for the concentration (hyper) ellipsoid problems is defined in the lecture supplement on concentration ellipsoids. Note that θ T Σ θ =

More information

Parametric Inference Maximum Likelihood Inference Exponential Families Expectation Maximization (EM) Bayesian Inference Statistical Decison Theory

Parametric Inference Maximum Likelihood Inference Exponential Families Expectation Maximization (EM) Bayesian Inference Statistical Decison Theory Statistical Inference Parametric Inference Maximum Likelihood Inference Exponential Families Expectation Maximization (EM) Bayesian Inference Statistical Decison Theory IP, José Bioucas Dias, IST, 2007

More information

Section 8.2. Asymptotic normality

Section 8.2. Asymptotic normality 30 Section 8.2. Asymptotic normality We assume that X n =(X 1,...,X n ), where the X i s are i.i.d. with common density p(x; θ 0 ) P= {p(x; θ) :θ Θ}. We assume that θ 0 is identified in the sense that

More information

DOA AND POLARIZATION ACCURACY STUDY FOR AN IMPERFECT DUAL-POLARIZED ANTENNA ARRAY. Miriam Häge, Marc Oispuu

DOA AND POLARIZATION ACCURACY STUDY FOR AN IMPERFECT DUAL-POLARIZED ANTENNA ARRAY. Miriam Häge, Marc Oispuu 19th European Signal Processing Conference (EUSIPCO 211) Barcelona, Spain, August 29 - September 2, 211 DOA AND POLARIZATION ACCURACY STUDY FOR AN IMPERFECT DUAL-POLARIZED ANTENNA ARRAY Miriam Häge, Marc

More information

Review Quiz. 1. Prove that in a one-dimensional canonical exponential family, the complete and sufficient statistic achieves the

Review Quiz. 1. Prove that in a one-dimensional canonical exponential family, the complete and sufficient statistic achieves the Review Quiz 1. Prove that in a one-dimensional canonical exponential family, the complete and sufficient statistic achieves the Cramér Rao lower bound (CRLB). That is, if where { } and are scalars, then

More information

ASYMPTOTIC PERFORMANCE ANALYSIS OF DOA ESTIMATION METHOD FOR AN INCOHERENTLY DISTRIBUTED SOURCE. 2πd λ. E[ϱ(θ, t)ϱ (θ,τ)] = γ(θ; µ)δ(θ θ )δ t,τ, (2)

ASYMPTOTIC PERFORMANCE ANALYSIS OF DOA ESTIMATION METHOD FOR AN INCOHERENTLY DISTRIBUTED SOURCE. 2πd λ. E[ϱ(θ, t)ϱ (θ,τ)] = γ(θ; µ)δ(θ θ )δ t,τ, (2) ASYMPTOTIC PERFORMANCE ANALYSIS OF DOA ESTIMATION METHOD FOR AN INCOHERENTLY DISTRIBUTED SOURCE Jooshik Lee and Doo Whan Sang LG Electronics, Inc. Seoul, Korea Jingon Joung School of EECS, KAIST Daejeon,

More information

Nonconcave Penalized Likelihood with A Diverging Number of Parameters

Nonconcave Penalized Likelihood with A Diverging Number of Parameters Nonconcave Penalized Likelihood with A Diverging Number of Parameters Jianqing Fan and Heng Peng Presenter: Jiale Xu March 12, 2010 Jianqing Fan and Heng Peng Presenter: JialeNonconcave Xu () Penalized

More information

Covariance function estimation in Gaussian process regression

Covariance function estimation in Gaussian process regression Covariance function estimation in Gaussian process regression François Bachoc Department of Statistics and Operations Research, University of Vienna WU Research Seminar - May 2015 François Bachoc Gaussian

More information

ECE 275A Homework 7 Solutions

ECE 275A Homework 7 Solutions ECE 275A Homework 7 Solutions Solutions 1. For the same specification as in Homework Problem 6.11 we want to determine an estimator for θ using the Method of Moments (MOM). In general, the MOM estimator

More information

Parametric Inference on Strong Dependence

Parametric Inference on Strong Dependence Parametric Inference on Strong Dependence Peter M. Robinson London School of Economics Based on joint work with Javier Hualde: Javier Hualde and Peter M. Robinson: Gaussian Pseudo-Maximum Likelihood Estimation

More information

The Canonical Gaussian Measure on R

The Canonical Gaussian Measure on R The Canonical Gaussian Measure on R 1. Introduction The main goal of this course is to study Gaussian measures. The simplest example of a Gaussian measure is the canonical Gaussian measure P on R where

More information

SELECTIVE ANGLE MEASUREMENTS FOR A 3D-AOA INSTRUMENTAL VARIABLE TMA ALGORITHM

SELECTIVE ANGLE MEASUREMENTS FOR A 3D-AOA INSTRUMENTAL VARIABLE TMA ALGORITHM SELECTIVE ANGLE MEASUREMENTS FOR A 3D-AOA INSTRUMENTAL VARIABLE TMA ALGORITHM Kutluyıl Doğançay Reza Arablouei School of Engineering, University of South Australia, Mawson Lakes, SA 595, Australia ABSTRACT

More information

Lecture 8: Information Theory and Statistics

Lecture 8: Information Theory and Statistics Lecture 8: Information Theory and Statistics Part II: Hypothesis Testing and I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 23, 2015 1 / 50 I-Hsiang

More information

Maximum Likelihood Estimation

Maximum Likelihood Estimation Chapter 8 Maximum Likelihood Estimation 8. Consistency If X is a random variable (or vector) with density or mass function f θ (x) that depends on a parameter θ, then the function f θ (X) viewed as a function

More information

Statistics 612: L p spaces, metrics on spaces of probabilites, and connections to estimation

Statistics 612: L p spaces, metrics on spaces of probabilites, and connections to estimation Statistics 62: L p spaces, metrics on spaces of probabilites, and connections to estimation Moulinath Banerjee December 6, 2006 L p spaces and Hilbert spaces We first formally define L p spaces. Consider

More information

BTRY 4090: Spring 2009 Theory of Statistics

BTRY 4090: Spring 2009 Theory of Statistics BTRY 4090: Spring 2009 Theory of Statistics Guozhang Wang September 25, 2010 1 Review of Probability We begin with a real example of using probability to solve computationally intensive (or infeasible)

More information

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A. 1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n

More information

ECE521 week 3: 23/26 January 2017

ECE521 week 3: 23/26 January 2017 ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear

More information

10-704: Information Processing and Learning Fall Lecture 24: Dec 7

10-704: Information Processing and Learning Fall Lecture 24: Dec 7 0-704: Information Processing and Learning Fall 206 Lecturer: Aarti Singh Lecture 24: Dec 7 Note: These notes are based on scribed notes from Spring5 offering of this course. LaTeX template courtesy of

More information

5682 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 55, NO. 12, DECEMBER /$ IEEE

5682 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 55, NO. 12, DECEMBER /$ IEEE 5682 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 55, NO. 12, DECEMBER 2009 Hyperplane-Based Vector Quantization for Distributed Estimation in Wireless Sensor Networks Jun Fang, Member, IEEE, and Hongbin

More information

ENEE 621 SPRING 2016 DETECTION AND ESTIMATION THEORY THE PARAMETER ESTIMATION PROBLEM

ENEE 621 SPRING 2016 DETECTION AND ESTIMATION THEORY THE PARAMETER ESTIMATION PROBLEM c 2007-2016 by Armand M. Makowski 1 ENEE 621 SPRING 2016 DETECTION AND ESTIMATION THEORY THE PARAMETER ESTIMATION PROBLEM 1 The basic setting Throughout, p, q and k are positive integers. The setup With

More information

Asymptotic inference for a nonstationary double ar(1) model

Asymptotic inference for a nonstationary double ar(1) model Asymptotic inference for a nonstationary double ar() model By SHIQING LING and DONG LI Department of Mathematics, Hong Kong University of Science and Technology, Hong Kong maling@ust.hk malidong@ust.hk

More information

MEASURE TRANSFORMED QUASI SCORE TEST WITH APPLICATION TO LOCATION MISMATCH DETECTION. Koby Todros. Ben-Gurion University of the Negev

MEASURE TRANSFORMED QUASI SCORE TEST WITH APPLICATION TO LOCATION MISMATCH DETECTION. Koby Todros. Ben-Gurion University of the Negev MEASURE TRASFORME QUASI SCORE TEST WITH ALICATIO TO LOCATIO MISMATCH ETECTIO Koby Todros Ben-Gurion University of the egev ABSTRACT In this paper, we develop a generalization of the Gaussian quasi score

More information

Estimation theory. Parametric estimation. Properties of estimators. Minimum variance estimator. Cramer-Rao bound. Maximum likelihood estimators

Estimation theory. Parametric estimation. Properties of estimators. Minimum variance estimator. Cramer-Rao bound. Maximum likelihood estimators Estimation theory Parametric estimation Properties of estimators Minimum variance estimator Cramer-Rao bound Maximum likelihood estimators Confidence intervals Bayesian estimation 1 Random Variables Let

More information

Complexity of two and multi-stage stochastic programming problems

Complexity of two and multi-stage stochastic programming problems Complexity of two and multi-stage stochastic programming problems A. Shapiro School of Industrial and Systems Engineering, Georgia Institute of Technology, Atlanta, Georgia 30332-0205, USA The concept

More information

Chapter 4: Asymptotic Properties of the MLE

Chapter 4: Asymptotic Properties of the MLE Chapter 4: Asymptotic Properties of the MLE Daniel O. Scharfstein 09/19/13 1 / 1 Maximum Likelihood Maximum likelihood is the most powerful tool for estimation. In this part of the course, we will consider

More information

Latent Variable Models and EM algorithm

Latent Variable Models and EM algorithm Latent Variable Models and EM algorithm SC4/SM4 Data Mining and Machine Learning, Hilary Term 2017 Dino Sejdinovic 3.1 Clustering and Mixture Modelling K-means and hierarchical clustering are non-probabilistic

More information

THERE is considerable literature about second-order statistics-based

THERE is considerable literature about second-order statistics-based IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 52, NO. 5, MAY 2004 1235 Asymptotically Minimum Variance Second-Order Estimation for Noncircular Signals Application to DOA Estimation Jean-Pierre Delmas, Member,

More information

Real-Valued Khatri-Rao Subspace Approaches on the ULA and a New Nested Array

Real-Valued Khatri-Rao Subspace Approaches on the ULA and a New Nested Array Real-Valued Khatri-Rao Subspace Approaches on the ULA and a New Nested Array Huiping Duan, Tiantian Tuo, Jun Fang and Bing Zeng arxiv:1511.06828v1 [cs.it] 21 Nov 2015 Abstract In underdetermined direction-of-arrival

More information

Section 9: Generalized method of moments

Section 9: Generalized method of moments 1 Section 9: Generalized method of moments In this section, we revisit unbiased estimating functions to study a more general framework for estimating parameters. Let X n =(X 1,...,X n ), where the X i

More information

Statistics. Lecture 2 August 7, 2000 Frank Porter Caltech. The Fundamentals; Point Estimation. Maximum Likelihood, Least Squares and All That

Statistics. Lecture 2 August 7, 2000 Frank Porter Caltech. The Fundamentals; Point Estimation. Maximum Likelihood, Least Squares and All That Statistics Lecture 2 August 7, 2000 Frank Porter Caltech The plan for these lectures: The Fundamentals; Point Estimation Maximum Likelihood, Least Squares and All That What is a Confidence Interval? Interval

More information

Monte Carlo Studies. The response in a Monte Carlo study is a random variable.

Monte Carlo Studies. The response in a Monte Carlo study is a random variable. Monte Carlo Studies The response in a Monte Carlo study is a random variable. The response in a Monte Carlo study has a variance that comes from the variance of the stochastic elements in the data-generating

More information

Density estimators for the convolution of discrete and continuous random variables

Density estimators for the convolution of discrete and continuous random variables Density estimators for the convolution of discrete and continuous random variables Ursula U Müller Texas A&M University Anton Schick Binghamton University Wolfgang Wefelmeyer Universität zu Köln Abstract

More information

Lecture 1: Entropy, convexity, and matrix scaling CSE 599S: Entropy optimality, Winter 2016 Instructor: James R. Lee Last updated: January 24, 2016

Lecture 1: Entropy, convexity, and matrix scaling CSE 599S: Entropy optimality, Winter 2016 Instructor: James R. Lee Last updated: January 24, 2016 Lecture 1: Entropy, convexity, and matrix scaling CSE 599S: Entropy optimality, Winter 2016 Instructor: James R. Lee Last updated: January 24, 2016 1 Entropy Since this course is about entropy maximization,

More information

Part 6: Multivariate Normal and Linear Models

Part 6: Multivariate Normal and Linear Models Part 6: Multivariate Normal and Linear Models 1 Multiple measurements Up until now all of our statistical models have been univariate models models for a single measurement on each member of a sample of

More information

Free Probability, Sample Covariance Matrices and Stochastic Eigen-Inference

Free Probability, Sample Covariance Matrices and Stochastic Eigen-Inference Free Probability, Sample Covariance Matrices and Stochastic Eigen-Inference Alan Edelman Department of Mathematics, Computer Science and AI Laboratories. E-mail: edelman@math.mit.edu N. Raj Rao Deparment

More information

Maximum Likelihood Estimation

Maximum Likelihood Estimation Chapter 7 Maximum Likelihood Estimation 7. Consistency If X is a random variable (or vector) with density or mass function f θ (x) that depends on a parameter θ, then the function f θ (X) viewed as a function

More information

1254 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 53, NO. 4, APRIL On the Virtual Array Concept for Higher Order Array Processing

1254 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 53, NO. 4, APRIL On the Virtual Array Concept for Higher Order Array Processing 1254 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 53, NO. 4, APRIL 2005 On the Virtual Array Concept for Higher Order Array Processing Pascal Chevalier, Laurent Albera, Anne Ferréol, and Pierre Comon,

More information

Statistical Data Analysis

Statistical Data Analysis DS-GA 0 Lecture notes 8 Fall 016 1 Descriptive statistics Statistical Data Analysis In this section we consider the problem of analyzing a set of data. We describe several techniques for visualizing the

More information

Lecture 10 Maximum Likelihood Asymptotics under Non-standard Conditions: A Heuristic Introduction to Sandwiches

Lecture 10 Maximum Likelihood Asymptotics under Non-standard Conditions: A Heuristic Introduction to Sandwiches University of Illinois Department of Economics Spring 2017 Econ 574 Roger Koenker Lecture 10 Maximum Likelihood Asymptotics under Non-standard Conditions: A Heuristic Introduction to Sandwiches Ref: Huber,

More information

σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) =

σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) = Until now we have always worked with likelihoods and prior distributions that were conjugate to each other, allowing the computation of the posterior distribution to be done in closed form. Unfortunately,

More information

Lecture 32: Asymptotic confidence sets and likelihoods

Lecture 32: Asymptotic confidence sets and likelihoods Lecture 32: Asymptotic confidence sets and likelihoods Asymptotic criterion In some problems, especially in nonparametric problems, it is difficult to find a reasonable confidence set with a given confidence

More information

hundred samples per signal. To counter these problems, Mathur et al. [11] propose to initialize each stage of the algorithm by aweight vector found by

hundred samples per signal. To counter these problems, Mathur et al. [11] propose to initialize each stage of the algorithm by aweight vector found by Direction-of-Arrival Estimation for Constant Modulus Signals Amir Leshem Λ and Alle-Jan van der Veen Λ Abstract In many cases where direction finding is of interest, the signals impinging on an antenna

More information

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review STATS 200: Introduction to Statistical Inference Lecture 29: Course review Course review We started in Lecture 1 with a fundamental assumption: Data is a realization of a random process. The goal throughout

More information

Variations. ECE 6540, Lecture 10 Maximum Likelihood Estimation

Variations. ECE 6540, Lecture 10 Maximum Likelihood Estimation Variations ECE 6540, Lecture 10 Last Time BLUE (Best Linear Unbiased Estimator) Formulation Advantages Disadvantages 2 The BLUE A simplification Assume the estimator is a linear system For a single parameter

More information

Unit Roots in White Noise?!

Unit Roots in White Noise?! Unit Roots in White Noise?! A.Onatski and H. Uhlig September 26, 2008 Abstract We show that the empirical distribution of the roots of the vector auto-regression of order n fitted to T observations of

More information

Robust Subspace DOA Estimation for Wireless Communications

Robust Subspace DOA Estimation for Wireless Communications Robust Subspace DOA Estimation for Wireless Communications Samuli Visuri Hannu Oja ¾ Visa Koivunen Laboratory of Signal Processing Computer Technology Helsinki Univ. of Technology P.O. Box 3, FIN-25 HUT

More information

Economics 583: Econometric Theory I A Primer on Asymptotics

Economics 583: Econometric Theory I A Primer on Asymptotics Economics 583: Econometric Theory I A Primer on Asymptotics Eric Zivot January 14, 2013 The two main concepts in asymptotic theory that we will use are Consistency Asymptotic Normality Intuition consistency:

More information

Chapter 4: Asymptotic Properties of the MLE (Part 2)

Chapter 4: Asymptotic Properties of the MLE (Part 2) Chapter 4: Asymptotic Properties of the MLE (Part 2) Daniel O. Scharfstein 09/24/13 1 / 1 Example Let {(R i, X i ) : i = 1,..., n} be an i.i.d. sample of n random vectors (R, X ). Here R is a response

More information

Estimation Theory Fredrik Rusek. Chapters

Estimation Theory Fredrik Rusek. Chapters Estimation Theory Fredrik Rusek Chapters 3.5-3.10 Recap We deal with unbiased estimators of deterministic parameters Performance of an estimator is measured by the variance of the estimate (due to the

More information

FAST AND ACCURATE DIRECTION-OF-ARRIVAL ESTIMATION FOR A SINGLE SOURCE

FAST AND ACCURATE DIRECTION-OF-ARRIVAL ESTIMATION FOR A SINGLE SOURCE Progress In Electromagnetics Research C, Vol. 6, 13 20, 2009 FAST AND ACCURATE DIRECTION-OF-ARRIVAL ESTIMATION FOR A SINGLE SOURCE Y. Wu School of Computer Science and Engineering Wuhan Institute of Technology

More information

B y t = γ 0 + Γ 1 y t + ε t B(L) y t = γ 0 + ε t ε t iid (0, D) D is diagonal

B y t = γ 0 + Γ 1 y t + ε t B(L) y t = γ 0 + ε t ε t iid (0, D) D is diagonal Structural VAR Modeling for I(1) Data that is Not Cointegrated Assume y t =(y 1t,y 2t ) 0 be I(1) and not cointegrated. That is, y 1t and y 2t are both I(1) and there is no linear combination of y 1t and

More information

Minimum Hellinger Distance Estimation in a. Semiparametric Mixture Model

Minimum Hellinger Distance Estimation in a. Semiparametric Mixture Model Minimum Hellinger Distance Estimation in a Semiparametric Mixture Model Sijia Xiang 1, Weixin Yao 1, and Jingjing Wu 2 1 Department of Statistics, Kansas State University, Manhattan, Kansas, USA 66506-0802.

More information

Applications and fundamental results on random Vandermon

Applications and fundamental results on random Vandermon Applications and fundamental results on random Vandermonde matrices May 2008 Some important concepts from classical probability Random variables are functions (i.e. they commute w.r.t. multiplication)

More information

l(y j ) = 0 for all y j (1)

l(y j ) = 0 for all y j (1) Problem 1. The closed linear span of a subset {y j } of a normed vector space is defined as the intersection of all closed subspaces containing all y j and thus the smallest such subspace. 1 Show that

More information

DOA Estimation of Quasi-Stationary Signals Using a Partly-Calibrated Uniform Linear Array with Fewer Sensors than Sources

DOA Estimation of Quasi-Stationary Signals Using a Partly-Calibrated Uniform Linear Array with Fewer Sensors than Sources Progress In Electromagnetics Research M, Vol. 63, 185 193, 218 DOA Estimation of Quasi-Stationary Signals Using a Partly-Calibrated Uniform Linear Array with Fewer Sensors than Sources Kai-Chieh Hsu and

More information

Basic concepts in estimation

Basic concepts in estimation Basic concepts in estimation Random and nonrandom parameters Definitions of estimates ML Maimum Lielihood MAP Maimum A Posteriori LS Least Squares MMS Minimum Mean square rror Measures of quality of estimates

More information

Lecture 1: Introduction

Lecture 1: Introduction Principles of Statistics Part II - Michaelmas 208 Lecturer: Quentin Berthet Lecture : Introduction This course is concerned with presenting some of the mathematical principles of statistical theory. One

More information

SYSM 6303: Quantitative Introduction to Risk and Uncertainty in Business Lecture 4: Fitting Data to Distributions

SYSM 6303: Quantitative Introduction to Risk and Uncertainty in Business Lecture 4: Fitting Data to Distributions SYSM 6303: Quantitative Introduction to Risk and Uncertainty in Business Lecture 4: Fitting Data to Distributions M. Vidyasagar Cecil & Ida Green Chair The University of Texas at Dallas Email: M.Vidyasagar@utdallas.edu

More information

Motivation CRLB and Regularity Conditions CRLLB A New Bound Examples Conclusions

Motivation CRLB and Regularity Conditions CRLLB A New Bound Examples Conclusions A Survey of Some Recent Results on the CRLB for Parameter Estimation and its Extension Yaakov 1 1 Electrical and Computer Engineering Department University of Connecticut, Storrs, CT A Survey of Some Recent

More information

Gaussian distributions and processes have long been accepted as useful tools for stochastic

Gaussian distributions and processes have long been accepted as useful tools for stochastic Chapter 3 Alpha-Stable Random Variables and Processes Gaussian distributions and processes have long been accepted as useful tools for stochastic modeling. In this section, we introduce a statistical model

More information

Consistency of the maximum likelihood estimator for general hidden Markov models

Consistency of the maximum likelihood estimator for general hidden Markov models Consistency of the maximum likelihood estimator for general hidden Markov models Jimmy Olsson Centre for Mathematical Sciences Lund University Nordstat 2012 Umeå, Sweden Collaborators Hidden Markov models

More information

On the smallest eigenvalues of covariance matrices of multivariate spatial processes

On the smallest eigenvalues of covariance matrices of multivariate spatial processes On the smallest eigenvalues of covariance matrices of multivariate spatial processes François Bachoc, Reinhard Furrer Toulouse Mathematics Institute, University Paul Sabatier, France Institute of Mathematics

More information

MA 575 Linear Models: Cedric E. Ginestet, Boston University Revision: Probability and Linear Algebra Week 1, Lecture 2

MA 575 Linear Models: Cedric E. Ginestet, Boston University Revision: Probability and Linear Algebra Week 1, Lecture 2 MA 575 Linear Models: Cedric E Ginestet, Boston University Revision: Probability and Linear Algebra Week 1, Lecture 2 1 Revision: Probability Theory 11 Random Variables A real-valued random variable is

More information

Lecture 4: Types of errors. Bayesian regression models. Logistic regression

Lecture 4: Types of errors. Bayesian regression models. Logistic regression Lecture 4: Types of errors. Bayesian regression models. Logistic regression A Bayesian interpretation of regularization Bayesian vs maximum likelihood fitting more generally COMP-652 and ECSE-68, Lecture

More information

Notes on Asymptotic Theory: Convergence in Probability and Distribution Introduction to Econometric Theory Econ. 770

Notes on Asymptotic Theory: Convergence in Probability and Distribution Introduction to Econometric Theory Econ. 770 Notes on Asymptotic Theory: Convergence in Probability and Distribution Introduction to Econometric Theory Econ. 770 Jonathan B. Hill Dept. of Economics University of North Carolina - Chapel Hill November

More information

Estimation of Dynamic Regression Models

Estimation of Dynamic Regression Models University of Pavia 2007 Estimation of Dynamic Regression Models Eduardo Rossi University of Pavia Factorization of the density DGP: D t (x t χ t 1, d t ; Ψ) x t represent all the variables in the economy.

More information

Convergence of Square Root Ensemble Kalman Filters in the Large Ensemble Limit

Convergence of Square Root Ensemble Kalman Filters in the Large Ensemble Limit Convergence of Square Root Ensemble Kalman Filters in the Large Ensemble Limit Evan Kwiatkowski, Jan Mandel University of Colorado Denver December 11, 2014 OUTLINE 2 Data Assimilation Bayesian Estimation

More information

LAN property for sde s with additive fractional noise and continuous time observation

LAN property for sde s with additive fractional noise and continuous time observation LAN property for sde s with additive fractional noise and continuous time observation Eulalia Nualart (Universitat Pompeu Fabra, Barcelona) joint work with Samy Tindel (Purdue University) Vlad s 6th birthday,

More information

The Tuning of Robust Controllers for Stable Systems in the CD-Algebra: The Case of Sinusoidal and Polynomial Signals

The Tuning of Robust Controllers for Stable Systems in the CD-Algebra: The Case of Sinusoidal and Polynomial Signals The Tuning of Robust Controllers for Stable Systems in the CD-Algebra: The Case of Sinusoidal and Polynomial Signals Timo Hämäläinen Seppo Pohjolainen Tampere University of Technology Department of Mathematics

More information

Hypothesis Testing. 1 Definitions of test statistics. CB: chapter 8; section 10.3

Hypothesis Testing. 1 Definitions of test statistics. CB: chapter 8; section 10.3 Hypothesis Testing CB: chapter 8; section 0.3 Hypothesis: statement about an unknown population parameter Examples: The average age of males in Sweden is 7. (statement about population mean) The lowest

More information

Part II Probability and Measure

Part II Probability and Measure Part II Probability and Measure Theorems Based on lectures by J. Miller Notes taken by Dexter Chua Michaelmas 2016 These notes are not endorsed by the lecturers, and I have modified them (often significantly)

More information

Long-Run Covariability

Long-Run Covariability Long-Run Covariability Ulrich K. Müller and Mark W. Watson Princeton University October 2016 Motivation Study the long-run covariability/relationship between economic variables great ratios, long-run Phillips

More information

Linear Regression and Its Applications

Linear Regression and Its Applications Linear Regression and Its Applications Predrag Radivojac October 13, 2014 Given a data set D = {(x i, y i )} n the objective is to learn the relationship between features and the target. We usually start

More information