MEASURE TRANSFORMED QUASI SCORE TEST WITH APPLICATION TO LOCATION MISMATCH DETECTION. Koby Todros. Ben-Gurion University of the Negev

Size: px

Start display at page:

Download "MEASURE TRANSFORMED QUASI SCORE TEST WITH APPLICATION TO LOCATION MISMATCH DETECTION. Koby Todros. Ben-Gurion University of the Negev"

Phillip Bruce
5 years ago
Views:

1 MEASURE TRASFORME QUASI SCORE TEST WITH ALICATIO TO LOCATIO MISMATCH ETECTIO Koby Todros Ben-Gurion University of the egev ABSTRACT In this paper, we develop a generalization of the Gaussian quasi score test (GQST) for composite binary hypothesis testing. The proposed test, called measure transformed GQST (MT-GQST), is based on the score-function of the measure transformed Gaussian quasi maximum likelihood estimator (MT-GQMLE) that operates by empirically fitting a Gaussian model to a transformed probability measure of the data. By judicious choice of the transform we show that, unlike the GQST, the proposed MT-GQST involves higher-order statistical moments can gain resilience to outliers, leading to significant mitigation of the model mismatch effect on the decision performance. A data-driven procedure for optimal selection of the measure transformation parameters is developed that minimizes the spectral norm of the empirical asymptotic error-covariance of the MT-GQMLE. This amounts to maximization of an empirical worst-case asymptotic local power at a fixed asymptotic size. The MT-GQST is applied to location mismatch detection of a near-field point source in a simulation example that illustrates its robustness to outliers. Index Terms Composite hypothesis testing, higher-order statistics, probability measure transform, robust statistics.. ITROUCTIO The score test -3, also known as Rao s score test, or the Lagrangian multiplier test, is a well established technique for composite binary hypothesis testing, 4, 5 that is based on the scorefunction of the maximum likelihood estimator (MLE). Unlike the generalized likelihood ratio test (GLRT), 5 Wald s test, 5, 6, it does not necessitate the maximum likelihood estimate under the alternative hypothesis, therefore, may be significantly easier to compute. However, similarly to Wald s test the GLRT, it assumes knowledge of the likelihood function. In many practical scenarios the likelihood function is unknown, therefore, alternatives to the score test become attractive. A popular alternative of this kind is the Gaussian quasi score test (GQST) 7- that is based on the score-function of the Gaussian quasi MLE (GQMLE) 7, -6, which assumes that the samples obey Gaussian distribution. The GQST has gained popularity due to its implementation simplicity ease of performance analysis that arise from the convenient Gaussian distribution. espite the model mismatch, introduced by the normality assumption, the GQST has the appealing property of consistency under some mild regularity conditions 8. However, in some circumstances, such as for certain types of non-gaussian data, deviation from normality can inflict poor decision performance. This can occur when the first second-order statistical moments are weakly identifiable over Score-function of an estimator is referred here to as the gradient of the estimator s objective function w.r.t. the vector parameter. the parameter space, or in the case of heavy-tailed data when the non-robust sample mean covariance provide poor estimates in the presence of outliers. In this paper, a generalization of the GQST is developed. The proposed generalization, called measure transformed GQST (MT- GQST), is based on the score-function of the measure transformed GQMLE (MT-GQMLE) 7, 8 that operates by empirically fitting a Gaussian model to a transformed probability measure of the data. The considered measure-transformation, also applied in 9-24, is structured by a non-negative function, called the MTfunction, that weights the probability distribution of the data. By judicious choice of the MT-function we show that, unlike the GQST, the proposed MT-GQST involves higher-order statistical moments, can gain resilience to outliers, yet have the computational implementation advantages of the GQST. Under some mild regularity conditions, we show that the MT- GQST is consistent. We also show that the asymptotic distribution of the test-statistic is central chi-squared under the null hypothesis, non-central chi-squared under a sequence of local alternatives, with non-centrality parameter that is increasing with the inverse asymptotic error-covariance of the MT-GQMLE. A data driven procedure for optimal selection of the MT-function within some parametric class is developed that minimizes the spectral norm of the empirical asymptotic error-covariance of the MT-GQMLE. We show that this minimization amounts to maximization of an empirical worst-case asymptotic local power at a fixed asymptotic size. The MT-GQST is illustrated for detecting a mismatch in the location of a near-field point source in the presence of spherically contoured noise. By specifying the MT-function within the family of zero-centred Gaussian functions parameterized by a scale parameter, we show that the MT-GQST outperforms the non-robust GQST another robust alternative, attains detection performance that are significantly closer to those of the omniscient score test that, unlike the MT-GQST, requires the knowledge of the likelihood function. The paper is organized as follows. In Sections 2, the MT- GQMLE is reviewed. In Section 3, the score-function of the MT- GQMLE is used to construct the proposed MT-GQST. The proposed test is applied to location mismatch detection in Section 4. In Section 5, the main points of this contribution are summarized. 2. MEASURE TRASFORME GQMLE: REVIEW We begin by reviewing the principles of the parametric probability measure transform 7, 8. We then define a parametric measuretransformed mean vector covariance matrix show their relation to higher-order statistical moments. Furthermore, we formulate their strongly consistent estimates state conditions for outlier resilience. Finally, these quantities are used to construct the MT- GQMLE 7, 8, whose objective function will be used in the following section to obtain the proposed MT-GQST.

2 2.. robability measure transform We define the measure space (X, S X, X;θ), where X C p is the observation space of a rom vector X, S X is a σ-algebra over X X;θ is an unknown probability measure on S X parameterized by a vector parameter θ that belongs to a parameter space Θ R m. efinition. Given a non-negative function u : C p R + satisfying 0 < E u (X) ; X;θ <, () where E u (X) ; X;θ u (x) dx;θ (x) x X, a transform on X;θ is defined via the relation: X Q (u) X;θ (A) T u X;θ (A) = ϕ u (x; θ) d X;θ (x), (2) where A S X ϕ u (x; θ) u (x)/e u (X) ; X;θ. The function u ( ) is called the MT-function. roof: see Appendix A in 2 roposition (roperties of the transform). Let Q (u) X;θ be defined by relation (2). Then ) Q (u) X;θ is a probability measure on S X. 2) Q (u) X;θ is absolutely continuous w.r.t. X;θ, with Radon-ikodym derivative 25: A dq (u) X;θ (x)/d X;θ (x) = ϕ u (x; θ). (3) The probability measure Q (u) X;θ is said to be generated by the MTfunction u ( ) The MT-mean MT-covariance According to (3) the mean vector covariance matrix of X under Q X;θ, (u) that are assumed to be known parameterized functions of θ, are given by: µ (u) X;θ EXϕu (X; θ) ; X;θ (4) Σ (u) X;θ EXX H ϕ u (X; θ) ; X;θ µ (u) X;θ µ(u)h X;θ, (5) respectively. Equations (4) (5) imply that µ (u) X;θ Σ (u) X;θ are weighted mean covariance of X under X;θ, with the weighting function ϕ u ( ; ) defined below (2). By modifying the MT-function u ( ), such that the condition () is satisfied, the MT-mean MTcovariance under Q (u) X;θ are modified. In particular, by choosing u ( ) to be any non-zero constant valued function we have Q (u) X;θ = X;θ, for which the stard mean vector µ X;θ covariance matrix Σ X;θ are obtained. Alternatively, when u ( ) is non-constant analytic function, which has a convergent Taylor series expansion, the resulting MT-mean MT-covariance involve higher-order statistical moments of X;θ The empirical MT-mean MT-covariance Given a sequence of i.i.d. samples from X;θ the empirical estimators of µ (u) X;θ Σ (u) X;θ are defined as: ˆΣ (u) X ˆµ (u) X XnXH n ˆϕ u (X n) ˆµ (u) x Xn ˆϕu (Xn) (6) ˆµ(u)H x, (7) respectively, where ˆϕ u (X n) u (X n)/ u (Xn). According to roposition 2 in 2, if E X 2 u (X) ; X;θ < then ˆµ (u) w.p. (u) X µ(u) w.p. X;θ ˆΣ X Σ(u) X;θ, where w.p. denotes convergence with probability (w.p.) 26. Robustness of the empirical MT-covariance (7) to outliers was studied in 2 using its influence function 27 which describes the effect on the estimator of an infinitesimal contamination at some point y C p. An estimator is said to be B-robust if its influence function is bounded 27. Similarly to the proof of roposition 3 in 2 it can be shown that if the MT-function u(y) the product u(y) y 2 are bounded over C p then the influence functions of both (6) (7) are bounded The MT-GQMLE Given a sequence of samples from X;θ, the MT-GQMLE 8 of θ minimizes the empirical Kulback-Leibler divergence 28 between the transformed probability measure Q (u) X;θ a complex circular Gaussian probability distribution Φ (u) X;ϑ 29, characterized by the MT-mean µ (u) X;ϑ MT-covariance Σ X;ϑ. (u) We have shown that this minimization amounts to maximization of the objective function: J u (ϑ) (u) ˆΣ X Σ X;ϑ (u) ˆµ (u) X µ(u) X;ϑ 2 Ω (u) x;ϑ, (8) where A B tr AB log det AB p is the log-determinant divergence 30 between positive definite matrices A, B, a C a H Ca denotes the weighted Euclidian norm of a vector a with positive-definite weighting matrix C Ω (u) X;ϑ (Σ X;ϑ) (u). The MT-GQMLE is given by: ˆθ u = arg max J u (ϑ). (9) ϑ Θ Under some mild regularity conditions, we have shown that the MT- GQMLE is asymptotically normal unbiased with convergence rate of /, i.e., (ˆθ u θ) (0, R u (θ)) as, where denotes convergence in distribution 26. The asymptotic error-covariance is given by where R u (θ) = F u (θ) G u (θ) F u (θ), (0) G u (θ) Eu 2 (X) ψ u (X; θ) ψ T u (X; θ) ; X;θ, () ψ u (X; θ) θ log φ (u) (X; θ), (2) F u (θ) E u (X) Γ u (X; θ) ; X;θ, (3) Γ u (X; θ) 2 θ log φ (u) (X; θ), (4) φ (u) (X; θ) is the density of the Gaussian measure Φ (u) X;θ it is assumed that F u (θ) is non-singular. 3. THE MEASURE TRASFORME GAUSSIA QUASI SCORE TEST Given a sequence of samples from X;θ, we use the score-function of the MT-GQMLE (9) to construct the proposed MT-GQST for the composite hypothesis testing problem: H 0 : θ = θ 0 (5) H : θ θ 0. Under some mild regularity conditions we show that the MT-GQST is consistent. Furthermore, we derive the asymptotic distribution of its test-statistic under the null hypothesis under a sequence of local alternatives. Finally, a data driven procedure for optimal selection of the MT-function is developed.

3 3.. The MT-GQST otice that by (6) (7) the objective function (8) is an empirical estimate of J u (θ, ϑ) Σ X;θ Σ (u) (u) X;ϑ µ (u) X;θ µ X;ϑ (u) 2. Ω (u) One can verify that when J u (θ, ϑ) is ϑ-differentiable it has a stationary point at ϑ = θ. Assuming that this stationary point is unique ϑ Ju (θ, θ 0) = 0 when θ = θ 0 ϑ Ju (θ, θ 0) 0 when θ θ 0. This motivates the use of J u (θ), i.e., the score-function of the MT-GQMLE, for testing between H 0 H. Hence, we define the normalized score-function: η u (θ) J u (θ) (/ ) u (Xn). By (6)-(8) x;ϑ η u (θ) = (/ ) u (Xn) ψ u (Xn; θ), (6) where ψ u (X; θ) is defined in (2). Furthermore, we define the empirical estimate of (): Ĝ u(θ) u2 (X n) ψ u (X n; θ) ψ T u (X n; θ). (7) The MT-GQST for the hypothesis testing problem (5) is defined as: T u η T H u (θ0) Ĝ u (θ 0)η u (θ 0) t, (8) H 0 where t R + denotes a threshold. By modifying the MT-function u ( ) such that condition () is satisfied the MT-GQST is modified, resulting in a family of tests. In particular, when u ( ) is any non-zero constant function Q (u) X;θ = X;θ the stard GQST is obtained that only involves first second-order statistical moments Asymptotic performance analysis Here, we study the asymptotic performance of the proposed test (8). We assume that a sequence of i.i.d. samples X n, n =,..., from X;θ is available that the parameter space Θ is compact. We begin by stating some regularity conditions that will be used in the sequel: (A-) E u (X) ψ u (X; θ 0) ; X;θ 0 for θ θ 0. (A-2) Eu 2 (X) ψ u (X; θ 0) ψ T u (X; θ 0) ; X;θ is non-singular. (A-3) µ (u) X;θ Σ (u) X;θ are twice continuously differentiable in Θ. (A-4) E u 4 (X) ; X;θ E X 8 u 4 (X) ; X;θ are bounded. (A-5) G u (θ) is bounded non-singular. (A-6) The density of X;θ is continuous in Θ a.e. over X. (A-7) The Fisher information matrix I FIM (θ) 29 is bounded. The following proposition states consistency conditions. roposition 2 (Consistency). Assume that conditions A- A-4 are satisfied. Then, for any t R r T u > t A proof is given in Appendix A under H. (9) ext, we derive the asymptotic distribution of the test-statistic under the null hypothesis under a sequence of local alternatives. roposition 3 (Asymptotic distribution under the null hypothesis). Assume that conditions A-3 A-5 are satisfied. Then, T u χ2 m under H 0, (20) where χ 2 m denotes a central chi-squared distribution with m-degrees of freedom. A proof appears in Appendix B Theorem (Asymptotic distribution under local alternatives). Assume that conditions A-3 A-7 are satisfied. Furthermore, consider a sequence of local alternatives that converges to θ 0 at a rate of /. Specifically, consider H : θ = θ 0 + h/, (2) where h R m is a non-zero locality parameter. Then, T u χ2 m (λ u(h)) under H, (22) where χ 2 m (λ u(h)) is a non-central chi-squared distribution with m-degrees of freedom non-centrality parameter λ u(h) h T R u (θ 0)h. The matrix R u( ) is the asymptotic error-covariance (0) of the MT-GQMLE (9). A proof appears in Appendix C The following Corollary is a direct consequence of (22), the Rayleigh-Ritz Theorem 4 the property that the right-tail probability of the non-central chi-squared distribution is monotonically increasing in the non-centrality parameter 32. Corollary (Asymptotic local power). Assume that conditions A- 3 A-7 hold. Under the local alternatives (2), the asymptotic power at a fixed asymptotic size α satisfies β u (α) (h) = Q χ 2 m (λ u(h)) ( Q χ 2 m (α) ), (23) where Q χ 2 m ( ) Q χ 2 m ( ) ( ) denote the right-tail probabilities of the central non-central chi-squared distributions, respectively. Furthermore, for any c > 0 the worst-case asymptotic power β (α) u (c) min h: h c β(α) u (h) = Q χ 2 m (γ u(c)) ( Q χ 2 m (α) ), (24) where γ u(c) c 2 R u(θ 0) S S denotes the spectral norm Selection of the MT-function While according to ropositions 2 3, the asymptotic global power size are invariant to the choice of the MT-function u( ), by Corollary one sees that it controls the asymptotic local power through the error-covariance R u(θ 0) (0). Since the tail probability of the non-central chi-squared distribution is monotonically increasing in the non-centrality parameter 32, minimization of the spectral norm R u(θ 0) S amounts to maximization of the (α) worst case asymptotic local power β u (c) (24) for any fixed c asymptotic size α. Hence, we propose to choose u( ) that minimizes ˆR u(θ 0) S, where ˆR u(θ) is an empirical estimate of error-covariance (0) defined as: ˆR u(θ) ˆF u (θ)ĝu(θ)ˆf u (θ), (25) where ˆF u (θ) u (Xn) Γu (Xn; θ) is an estimate of (3) Ĝu (θ) is defined in (7). It can be shown that if conditions A-3, A-4, A-5, A-7, are satisfied then ˆR u(θ 0) Ru(θ0) under the local alternatives (2).

4 Here, we restrict the class of MT-functions to some parametric family {u (X; ω), ω Ω C r } that satisfies the conditions stated in efinition Theorem. For example, the Gaussian family of functions that satisfy the robustness conditions stated at the ending paragraph of Subsection 2.3 is a natural choice for inducing outlier resilience. Hence, an optimal choice of the MT-function parameter ω would minimize ˆR u(θ 0) S that is constructed from (25) by the same sequence of samples used for obtaining the MT-GQST (8). 4. UMERICAL EXAMLE We consider the problem of detecting a mismatch in the location of an emitting narrowb near-field point source that is formulated as the following composite binary hypothesis testing problem: H 0 : X n = S na (θ 0) + W n, n =,...,, (26) H : X n = S na (θ) + W n, θ θ 0, n =,...,, where {X n C p } is an observation process, {S n C} is an i.i.d. symmetrically distributed rom signal process, {W n C p } is an i.i.d noise process that is statistically independent of {S n}. We assume that the noise component is spherically contoured 33 with stochastic representation W n = ν nz n, where {ν n R ++} is an i.i.d. process {Z n C p } is a propercomplex wide-sense stationary Gaussian process with zero-mean scaled unit covariance σzi. 2 The processes {ν n} {Z n} are assumed to be statistically independent. The vector a (θ), θ r, ϑ T, is the steering vector of a uniform linear array of p sensors with inter-element spacing d that receive a signal with wavelength λ generated by a narrowb near-field point source with range r bearing ϑ. By Fresnel s approximation 34, 35 when 0.62(d 3 (p ) 3 /λ) /2 < r < 2d 2 (p ) 2 /λ we have a (θ) k = exp ( j(ω ek + φ ek 2 + O(d 2 /r 2 )) ), k = 0,..., p, where ω e 2πd sin (ϑ)/λ φ e πd 2 cos 2 (ϑ)/(λr) are called electrical angles. In order to gain robustness against outliers, as well as sensitivity to higher-order moments, we specify the MT-function in the zerocentred Gaussian family of functions parametrized by a width parameter ω, i.e., u (x; ω) = exp ( x 2 /ω 2), ω R ++. (27) otice that the MT-function (27) satisfies the robustness conditions stated at the ending paragraph of Subsection 2.3. To obtain the corresponding test-statistic (8) the empirical error-covariance (25), one has to compute the vector matrix functions ψ u ( ; θ) Γ u ( ; θ), defined in (2) (4), respectively. By (4), (5), (2), (4), (26) (27) these quantities take the following simple forms: ψ u (X; θ) = ξ (ω) θ X H a (θ) 2, 2 r X H a (θ) 2 2 rϑ X H a (θ) 2 Γ u (X; θ) = ξ (ω) X H a (θ) 2 X H a (θ) 2, 2 rϑ where ξ (ω) is a strictly positive functions of ω. It is important to note that the resulting test-statistic (8) the empirical errorcovariance (25) are independent of ξ (ω). In the following simulation we evaluate the detection performance of the proposed MT-GQST as compared to the score test, the GQST, another robust GQST extension, called here ZML- GQST. The ZML-GQST operates by applying GQST after passing the data through a zero-memory non-linear (ZML) function that suppresses outliers by clipping the amplitude of the observations. 2 ϑ We use the same ZML preprocessing approach that has been applied in to robustify the MUSIC algorithm 6. We considered a BSK signal with variance σs 2 impinging on p = 8 sensors with inter-element spacing d = λ/4 = 0.25 m. Two types of noise distributions were examined: ) Gaussian 2) heavy-tailed K-distributed noise 33 with shape parameter κ = The sample size was set to = 000. The signal-to-noiseratio (SR), used to index the detection performance, is defined as SR 0 log 0 σs/σ 2 Z. 2 The location vector parameter at the null was set to θ 0 = r 0, ϑ 0 T, where r 0 =.5 m ϑ 0 = 0. We considered a specific local alternative θ = r, ϑ T, corresponding to h = (θ θ 0) in (2), where r = r m ϑ = ϑ The optimal width parameter ω opt of Gaussian MT-function (27) was obtained by minimizing the spectral norm ˆR u(θ 0) S of the empirical error-covariance (25) over Ω =, 30. All empirical power curves were obtained using 0 4 Monte-Carlo simulations. Fig. depicts the empirical asymptotic (23) power curves of the MT-GQST as compared to the empirical power curves of the GQST, ZML-GQST the score test for a fixed test size equal to 0 2. otice that when the noise is Gaussian, the MT-GQST, GQST ZML-GQST attain similar performance. For the K-distributed noise, the MT-GQST outperforms the GQST ZML-GQST, significantly reduces the gap towards the score test, which unlike the MT-GQST, assumes knowledge of the likelihood function. ower ower Asymptotic power: MT GQST Empirical power: MT GQST 0.2 Empirical power: GQST Empirical power: ZML GQST Empirical power: omnicient score test SR db Asymptotic power: MT GQST Empirical power: MT GQST Empirical power: GQST Empirical power: ZML GQST Empirical power: omnicient score test SR db Fig.. Location mismatch detection in Gaussian noise (top) K-distributed noise (bottom). 5. COCLUSIO In this paper a new score-type test, called MT-GQST, was derived based on the score-function of the measure transformed GQMLE. By specifying the MT-function in the Gaussian family, the proposed test was applied to location mismatch detection in non-gaussian noise. Exploration of other MT-functions may result in additional tests in this class that have different useful properties.

5 Appendices A. roof of roposition 2: We first show that η u (θ 0) Ĝu(θ0) satisfy Ĝ u(θ 0) w.p. η u (θ 0) E u (X) ψ u (X; θ 0) ; X;θ (28) w.p. Eu2 (X) ψ u (X; θ 0) ψ T u (X; θ 0) ; X;θ. (29) Since {X n} are i.i.d. rom variables the functions u ( ) ψ u (, ) are real, the products {u (X n) ψ u (X n, θ 0)} {u (X n) ψ u (X n, θ 0) ψ T u (X n, θ 0)}, comprising (6) (7), respectively, are real i.i.d. Furthermore, by (2), (60) assumption A-3 it can be shown that there exists a positive constant B such that ψu (X, θ) k B 2 l=0 X l, k =,..., p ψ u (X; θ) ψ T u (X; θ) k,j B 4 l=0 X l, k, j =,..., p. Hence, according to (), Assumption A-4 Hölder s inequality 26 the expectations {Eu (X) ψu (X, θ 0) k ; X;θ} p k= {Eu 2 (X) ψ u (X; θ 0) ψ T u (X; θ0) k,j ; X;θ} p k,j= are finite. Therefore, by (6), (7) Khinchine s strong law of large numbers 25 relations (28) (29) must hold. Thus, by (8), (28), (29), Assumptions A-, A-2 the Mann- Wald theorem 39 we conclude that Tu w.p. C under H, where C denotes some positive constant. The relation (9) directly follows. B. roof of roposition 3: Under assumptions A-3 A-4 it is shown in Lemma 5 stated in Appendix B in 8 that the normalized score function (6) satisfies η u (θ 0) (0, Gu (θ0)). (30) Furthermore, by assumptions A-3 A-4 it also follows from relation (29) in Appendix A that under H 0 ( X;θ = X;θ 0) Ĝ u (θ 0) w.p. Gu (θ0), (3) where according to assumption A-5 G u (θ 0) is non-singular. Hence, relation (20) follows from (30), (3), Slutskey s Theorem 26, Mann-Wald s Theorem 39 the properties of quadratic forms of Gaussian rom variables 5. C. roof of Theorem : In ropositions 4 5 stated below we show that η u (θ 0) (F u (θ 0) h, G u (θ 0)) Ĝu (θ0) Gu (θ0). Hence, by (8) relation (22) follows from Slutskey s Theorem 26, Mann- Wald s Theorem 39 the properties of quadratic forms of Gaussian rom variables 5. The following lemmas are based on the fact that since by (2) the parameter θ changes with the sample size, the observations form a triangular array 40 (rather than a sequence) of rom vectors: X,k, k =,...,,, (32) where X,k, k =,..., are i.i.d. with probability distribution X;θ 0+ h. roposition 4. Assume that conditions A-3-A-7 are satisfied. Under the local alternatives (2), the normalized score function η u (θ 0) (6) satisfies: η u (θ 0) (Fu (θ0) h, Gu (θ0)). (33) roof. By assumption A-3, the vector function ψ u (x; θ) defined in (2) is continuous in Θ for any x X. Therefore, by Identity, the normalized score function (6) satisfies η u (θ 0) = η u (θ) + ˆF u (θ ) h, where ˆF u ( ) is defined below (25) θ lies in the line segment connecting θ θ 0. Furthermore, by Lemmas 2 stated below in Appendix, η u (θ) (0, Gu (θ0)) ˆF u (θ ) Fu (θ0). Therefore, the relation (33) follows directly from Slutskey s Theorem 26. roposition 5. Assume that conditions A-3, A-4, A-6, A-7 are satisfied. Then, under (2) Ĝ u (θ 0) Gu (θ0). (34) roof. efine a triangular array of real rom variables obtained from the array (32): where Y,k g (X,k ), k =,...,,, (35) g (X) u 2 (X) ψ u (X; θ 0) l ψ u (X; θ 0) m. (36) Since X,k, k =,..., are i.i.d. the functions u ( ) ψ u ( ; ) are real, then Y,k, k =,..., are real i.i.d. Furthermore, define the rom variable Y g (X), (37) where X has probability distribution X;θ 0. Let F Y, ( ) F Y ( ) denote the c.d.fs of Y, Y, respectively. We show that F Y, (y) FY (y) y C, (38) where C R denotes the set of continuity points of F Y (y). Let ζ Y, (t) ζ Y (t) denote the characteristic functions of Y, Y, respectively. By (35) (37) their difference satisfies ζy, (t) ζ Y (t) = X eitg(x) (f(x; θ) f(x; θ 0))dρ (x), where f(x; θ) d X;θ(x)/dρ(x) is the density function of X;θ w.r.t a dominating σ-finite measure ρ on S X. Hence, by (2), (65) assumption A-6 ζy, (t) ζ Y (t) (a) X h T η(x; θ ) f (x; θ ) dρ (x) = E h T η(x; θ ) ; X;θ (b) E h T η(x; θ ) 2 ; X;θ = h I FIM (θ ), (39)

6 where η(x; θ) θ log f(x; θ), I FIM Eη(x; θ)η T (x; θ) is the Fisher-information matrix 42, θ lies in the line segment connecting θ = θ 0 + h θ 0, (a) follows from the triangle inequality, (b) follows from Hölder s inequality 26. Therefore, by (39) (62) ζ Y (t) ζy (t) for any t R. Hence, by Theorem.2.2 in (38) must hold. ext, we show that E Y, ; Y, By (35), (37) (65) it is implied that E Y, ; Y, = E Y ; Y + where, by Hölder s inequality E E Y ; Y <. (40) g (X) h T η (X; θ ) ; X;θ E 2 g (X) h T η (X; θ) ; X;θ E g (X) 2 ; X;θ h 2 I FIM (θ). According to (2), (36), (60), assumption A-4, the compactness of Θ Hölder s inequality 26, there exists a constant B > 0 such that h (x) Bu 4 (x) 8 r=0 x r g (x) 2 E h (X) ; X;θ is bounded. Hence, by (62) we conclude that the expectation E g (X) h T η (X; θ) ; X;θ must be bounded, therefore, the relation (40) must hold. Finally, by (), (7), (35)-(38), (40) Lemma 5.4. in we concluded that Ĝu (θ0) Gu (θ0). Auxiliary Lemmas for roposition 4: Lemma. Assume that conditions A-3-A-5 are satisfied. Under the local alternatives (2), the normalized score function (6) satisfies: η u (θ), (0, Gu (θ0)). (4) roof. efine a triangular array of real rom variables obtained from the array in (32): where Y,k g (X,k ), k =,...,,, (42) g (x) u (x) t T ψ u (x; θ) (43) t is an arbitrary non-zero vector in R m. Since X,k, k =,..., are i.i.d. the functions u ( ) ψ u ( ; ) are real, then Y,k, k =,..., are real i.i.d. In the following we show some properties of the statistical moments of Y,k. By Identity 2, E Y,k ; Y, = 0. (44) Moreover, by (), (42), (43), assumption A-5 the compactness of Θ there exists a constant M > 0 such that E Y,k; 2 Y, = t 2 G u(θ) M θ Θ, (45) where under (2) we have that t t Gu(θ) G u(θ0) > 0. (46) Using (42), (43), (60), assumptions A-3, A-4, the compactness of Θ Hölder s inequality 26 it can be shown that the exists a constant M > 0, such that E Y,k 4 ; Y, = E g 4 (X) ; X;θ M θ Θ. (47) Clearly, by (45) s 2 EY,k; 2 Y, = t 2 G. (48) u(θ) k= Therefore, by (46)-(48) s 4 k= E Y,k 4 E Y, 4 ; Y, ; Y, = t 4 G u(θ) 0. Hence, by Lyapounov s central limit theorem, (46) Slutskey s Theorem 26 we conclude that Y,k k= ) (0, t 2 G u(θ0). (49) The relation (4) follows directly from (42), (43), (49) the Cramér-Wold evice. Lemma 2. Assume that conditions A-3, A-4, A-6, A-7 are satisfied. Under the local alternatives (2) ˆF u (θ ) Fu (θ0). (50) roof. efine a triangular array of real rom variables obtained from the array (32): where Y,k g (X,k ; θ ), k =,...,,, (5) g (X; θ) u (X) Γ u (x; θ) l,m (52) Γ u ( ; ) is defined in (4). Since X,k, k =,..., are i.i.d. the functions u ( ) Γ u ( ; ) are real, then Y,k, k =,..., are real i.i.d. Furthermore, define the rom variable Y g (X; θ 0), (53) where X has probability distribution X;θ 0. Let F Y, ( ) F Y ( ) denote the c.d.fs of Y, Y, respectively. We show that F Y, (y) FY (y) y C, (54) where C R denotes the set of continuity points of F Y (y). Let ζ Y, (t) ζ Y (t) denote the characteristic functions of Y, Y, respectively. By (5), (53), assumption A-6 Identity 3 their difference satisfies: ζ Y, (t) ζ Y (t) = + (a) + (b) E E e ig(x;θ )t e ig(x;θ0)t ; X;θ 0 h T η (X; θ ) ; X;θ e ig(x;θ E )t e 0)t ig(x;θ ; X;θ 0 h E T η (X; θ ) ; X;θ E s (X; θ, θ 0, t) ; X;θ 0 (55) + h IFIM (θ ), where η(x; θ) θ log f(x; θ), I FIM Eη(x; θ)η T (x; θ) is the Fisher-information matrix 42, the term s (x; θ, θ 0, t)

7 2 sin ((g (x; θ) g (x; θ 0)) t). We note that (a) follows from the triangle inequality, while (b) follows from Hölder s inequality 26 the definition of the Fisher-information matrix. otice that s (x; θ, θ 0, t) is bounded by (4), (52) A-3 it is also continuous at θ = θ 0 for any x X. Therefore, since θ θ0, by the bounded convergence theorem 26 we conclude that E s (X; θ, θ 0, t) ; X;θ 0 0 for any t R. Furthermore, by assumption A-7 (62) the weighted Euclidean norm h 2 I FIM (θ) is bounded over Θ, therefore, by (55) ζ Y (t).2.2 in (54) must hold. ext, we show that ζy (t) for any t R. Hence, by Theorem E Y, ; Y, By (5), (53) (65) E Y, ; Y, E Y ; Y <. (56) = E g (X; θ ) ; X;θ 0 (57) + E g (X; θ ) h T η (X; θ ) ; X;θ According to (4), (52) assumption A-3 g (x, θ) is continuous in Θ for any x X. Moreover, by (4), (52), (6), assumptions A-3, A-4, the compactness of Θ Hölder s inequality 26 there exist positive constants B, C such that: h (x) B 2 x r g (x, θ) (58) r=0 Relation 2. Assume that the parameter space Θ is compact. Furthermore, assume that the Fisher-Information matrix I FIM (θ) 42 is continuous in Θ. For any h R m there exists a positive constant M such that M θ Θ. (62) h 2 I FIM (θ) Identity. Assume that the vector function ψ u (x; θ) (2) is continuous in Θ for any x X. Furthermore, let θ = θ 0 + h. By the mean-value Theorem 44, applied to each entry of ψ u (x; θ) (4), the normalized score function η u (θ) (6) satisfies: η u (θ 0) = η u (θ) + ˆF u (θ ) h (63) where ˆF u ( ) is defined below (25) θ lies in the line segment connecting θ θ 0. Identity 2. The vector function ψ u (X; θ) (2) satisfies: E u (X) ψ u (X; θ) ; X;θ = 0. (64) The proof appears in Lemma 6 in 8 Identity 3. Let f (x; θ) d X;θ(x)/dρ(x) denote the density function of X;θ w.r.t. a σ-finite measure ρ on S X. Assume that f (x; θ) is continuous in Θ ρ-a.e. Furthermore, let θ = θ 0 + h. By the mean-value Theorem 44 f (x; θ) = f(x; θ 0) + ht η(x; θ )f(x; θ ) ρ-a.e., (65) where η(x; θ) θ log f(x; θ) θ lies in the line segment connecting θ θ 0. E h (X) ; X;θ E h 2 (X) ; X;θ C. (59) Therefore, since θ θ0, by (53), (58), (59) the dominated convergence theorem 26 we conclude that the expectation E g (X; θ ) ; X;θ 0 E g (X; θ0) ; X;θ 0 E Y ; Y <. We now show that the second summ in (57) converges to zero as θ approaches θ 0. By Hölder s inequality 26, (58) (59) E g(x;θ ) h T η(x;θ ); x;θ 2 h 2 I FIM (θ ) E g (X; θ ) 2 ; X;θ E h 2 (X) ; X;θ C. Therefore, by (62) we conclude that the nominator in (57) is bounded, hence, the right summ in the l.h.s. of (56) approaches zero as. Finally, by (4), Fhat, (5)-(54), (56) Lemma 5.4. in we conclude that (50) holds. E. Useful relations identities: Relation. Assume that the parameter space Θ is compact. Furthermore, assume that µ (u) X (θ) Σ (u) X (θ) are twice continuously differentiable in Θ. efine d (x) 2 r=0 x r. According to (2), (4) the triangle inequality, the Cauchy-Schwartz inequality its the matrix extension 43, there exist positive constants B B 2, such that for any (x, θ) X Θ: ψ u (x; θ) k B d (x), k =,..., m (60) Γ u (x; θ) k,l B 2d (x) k, l =,..., m. (6)

8 A. REFERECES E. L. Lehmann J.. Romano, Testing Statistical Hypotheses. Springer Texts in Statistics, C. R. Rao, Large sample tests of statistical hypotheses concerning several parameters with applications to problems of estimation, Mathematical roceedings of the Cambridge hilosophical Society, vol. 44, no., pp , S. M. Silvey, The Lagrangian multiplier test, The Annals of Mathematical Statistics, vol. 30, no. 2, pp , L. L. Scharf, Statistical Signal rocessing, Addison-Wesley, S. M. Kay, Fundamentals of statistical signal processing: detection theory, rentice-hall, A. Wald, Tests of statistical hypotheses concerning several parameters when the number of observations is large, Transactions of the American Mathematical society, vol. 54, no. 3, pp , H. White, Maximum likelihood estimation of misspecified models, Econometrica: Journal of the Econometric Society, pp. -25, H. White, Estimation, inference specification analysis, Cambridge university press, R. F. Engle, Wald, likelihood ratio, Lagrange multiplier tests in econometrics, Hbook of econometrics, vol. 2, pp , G. Fiorentini E. Sentana, Tests for serial dependence in static, non-gaussian factor models, Unobserved components time series econometrics, pp. 8-63, Oxford University ress, 205. T. Bollerslev J. M. Wooldridge, Quasi-maximum likelihood estimation inference in dynamic models with time-varying covariances, Econometric reviews, vol., no. 2, pp , O. Rousseaux, G. Leus,. Stoica, Gaussian maximum-likelihood channel estimation with short training sequences, IEEE Transactions on Wireless Communications, vol. 4, no. 6, pp , E. Weinstein, ecentralization of the Gaussian maximum likelihood estimator its applications to passive array processing, IEEE Transactions on Acoustics, Speech Signal rocessing, vol. 29, no. 5, pp , C. Gourieroux, A. Monfort, A. Trognon, seudo maximum likelihood methods: Theory, Econometrica, vol. 52, no. 3, pp , B. Ottersten, M. Viberg,. Stoica, A. ehorai, Exact large sample maximum likelihood techniques for parameter estimation detection, in Radar Array rocessing, ch. 4, pp. 99-5, Springer-Verlag, H. Krim M. Viberg, Two decades of array signal processing research: the parametric approach, IEEE Signal rocessing Magazine, vol. 3, no. 4, pp , K. Todros A. O. Hero, Measure-transformed quasi maximum likelihood estimation with application to source localization, roceedings of ICASS 205, pp K. Todros A. O. Hero, Measure-transformed quasi maximum likelihood estimation, Accepted for publication in the IEEE Trans. Signal rocessing, 206. Arxiv version: 9 K. Todros A. O. Hero, On measure-transformed canonical correlation analysis, IEEE Transactions on Signal rocessing, vol. 60, no. 9, pp , K. Todros A. O. Hero, Measure transformed canonical correlation analysis with application to financial data, roceedings of SAM 202, pp , K. Todros A. O. Hero, Robust multiple signal classification via probability measure transformation, IEEE Transactions on Signal rocessing, vol. 63, no. 5, K. Todros A. O. Hero, Robust measure transformed MUSIC for OA estimation, roceeding of ICASS 204, pp , K. Todros A. O. Hero, Measure-transformed quasi likelihood ratio test, roceedings of ICASS 206, pp Halay, K. Todros A. O. Hero, Measure-transformed quasi likelihood ratio test for Bayesian binary hypothesis testing, roceedings of SS G. B. Foll, Real Analysis, John Wiley Sons, K. B. Athreya S.. Lahiri, Measure theory probability theory, Springer- Verlag, F. R. Hampel, E. M. Ronchetti,. J. Rousseeuw W. A. Stahel, Robust statistics: the approach based on influence functions. John Wiley & Sons, S. Kullback R. A. Leibler, On information sufficiency, The Annals of Mathematical Statistics, vol. 22, pp , Schreier L. L. Scharf, Statistical signal processing of complex-valued data: the theory of improper noncircular signals. p. 39, Cambridge University ress, I. S. hillon J. A. Tropp, Matrix nearness problems with Bregman divergences, SIAM Journal on Matrix Analysis Applications, vol. 29, no. 4, pp , R. Horn C. R. Johnson, Matrix analysis. Cambridge University ress, A. Aubel W. Gawronski, Analytic properties of non-central distributions, Applied Mathematics Computation, vol. 4, pp. 3?2, E. Ollila,. E. Tyler, V. Koivunen H. V. oor, Complex elliptically symmetric distributions: survey, new results applications, IEEE Transactions on Signal rocessing, vol. 60, no., pp , E. Grosicki, A. M. Karim, H. Yingbo A weighted linear prediction method for near-field source localization, IEEE Transactions on Signal rocessing, vol. 53, no. 0, pp , M.. El Korso, R. Boyer, A. Renaux, S. Marcos, Conditional unconditional Cramér Rao bounds for near-field source localization, IEEE Transactions on Signal rocessing, vol. 58, no. 5, pp , A. Swami, On some parameter estimation problems in alpha-stable processes, roceedings of ICASS-97, vol. 5, pp , A. Swami B. M. Sadler, TE, OA related parameter estimation problems in impulsive noise, roceedings of the IEEE Signal rocessing Workshop on Higher-Order Statistics, 997, pp , A. Swami B. M. Sadler, On some detection estimation problems in heavy-tailed noise, Signal rocessing, vol. 82, no. 2, pp , H. B. Mann A. Wald, On stochastic limit order relationships, Ann. Math. Stat., vol. 4, pp , R. J. Serfling, Approximation theorems of mathematical statistics. John Wiley & Sons, R. Horn C. R. Johnson, Matrix analysis. Cambridge University ress, S. M. Kay, Fundamentals of statistical signal processing: estimation theory, rentice-hall, X. Yang, Some trace inequalities for operators, Journal of the Australian Mathematical Society (Series A), vol. 58, no. 3, pp , C. H. Edwards, Advanced calculus of several variables, Courier Corporation, 202.

MEASURE TRANSFORMED QUASI LIKELIHOOD RATIO TEST

MEASURE TRANSFORMED QUASI LIKELIHOOD RATIO TEST Koby Todros Ben-Gurion University of the Negev Alfred O. Hero University of Michigan ABSTRACT In this paper, a generalization of the Gaussian quasi likelihood