Optimal Joint Detection and Estimation in Linear Models

Similar documents
Towards Green Distributed Storage Systems

Assignment 4 (Solutions) NPTEL MOOC (Bayesian/ MMSE Estimation for MIMO/OFDM Wireless Communications)

Astrometric Errors Correlated Strongly Across Multiple SIRTF Images

Notes on Linear Minimum Mean Square Error Estimators

Weiss-Weinstein Bounds for Various Priors

Lecture 7 Introduction to Statistical Decision Theory

Optimal Mean-Square Noise Benefits in Quantizer-Array Linear Estimation Ashok Patel and Bart Kosko

Detecting Parametric Signals in Noise Having Exactly Known Pdf/Pmf

IN MOST problems pertaining to hypothesis testing, one

UNIVERSITY OF TRENTO ITERATIVE MULTI SCALING-ENHANCED INEXACT NEWTON- METHOD FOR MICROWAVE IMAGING. G. Oliveri, G. Bozza, A. Massa, and M.

Position in the xy plane y position x position

Target Trajectory Estimation within a Sensor Network

Optimized Concatenated LDPC Codes for Joint Source-Channel Coding

Introduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak

Detection theory 101 ELEC-E5410 Signal Processing for Communications

Detection and Estimation Theory

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012

Fundamentals of Statistical Signal Processing Volume II Detection Theory

Gaussian Estimation under Attack Uncertainty

NOTES ON THE REGULAR E-OPTIMAL SPRING BALANCE WEIGHING DESIGNS WITH CORRE- LATED ERRORS

Trajectory Estimation for Tactical Ballistic Missiles in Terminal Phase Using On-line Input Estimator

DETECTION theory deals primarily with techniques for

Expectation propagation for signal detection in flat-fading channels

DS-GA 1002 Lecture notes 11 Fall Bayesian statistics

CSE555: Introduction to Pattern Recognition Midterm Exam Solution (100 points, Closed book/notes)

Evolution Analysis of Iterative LMMSE-APP Detection for Coded Linear System with Cyclic Prefixes

Bayesian Methods in Positioning Applications

Lecture 8: Information Theory and Statistics

SUPPLEMENTARY MATERIAL. Authors: Alan A. Stocker (1) and Eero P. Simoncelli (2)

A simple one sweep algorithm for optimal APP symbol decoding of linear block codes

Optimum Joint Detection and Estimation

Modeling Highway Traffic Volumes

Lecture 4: Probabilistic Learning. Estimation Theory. Classification with Probability Distributions

State estimation of linear dynamic system with unknown input and uncertain observation using dynamic programming

Introduction to Bayesian Statistics

Lessons in Estimation Theory for Signal Processing, Communications, and Control

On the Linear Threshold Model for Diffusion of Innovations in Multiplex Social Networks

0 a 3 a 2 a 3 0 a 1 a 2 a 1 0

Statistical Signal Processing Detection, Estimation, and Time Series Analysis

Information Propagation Speed in Bidirectional Vehicular Delay Tolerant Networks

ADAPTIVE FILTER THEORY

Introduction p. 1 Fundamental Problems p. 2 Core of Fundamental Theory and General Mathematical Ideas p. 3 Classical Statistical Decision p.

FAST AND ACCURATE DIRECTION-OF-ARRIVAL ESTIMATION FOR A SINGLE SOURCE

QUANTIZATION FOR DISTRIBUTED ESTIMATION IN LARGE SCALE SENSOR NETWORKS

UWB Geolocation Techniques for IEEE a Personal Area Networks

Relativity in Classical Mechanics: Momentum, Energy and the Third Law

Bayesian Machine Learning

4. A Physical Model for an Electron with Angular Momentum. An Electron in a Bohr Orbit. The Quantum Magnet Resulting from Orbital Motion.

Lecture 4: Probabilistic Learning

COMPARISON OF STATISTICAL ALGORITHMS FOR POWER SYSTEM LINE OUTAGE DETECTION

Decentralized Detection In Wireless Sensor Networks

An Optimal Split-Plot Design for Performing a Mixture-Process Experiment

State-space Modelling of Hysteresis-based Control Schemes

Probabilistic Engineering Design

Noise constrained least mean absolute third algorithm

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016

Progress In Electromagnetics Research Symposium 2005, Hangzhou, China, August

v v Downloaded 01/11/16 to Redistribution subject to SEG license or copyright; see Terms of Use at

SEG/San Antonio 2007 Annual Meeting

Non-Surjective Finite Alphabet Iterative Decoders

Kalman filtering and friends: Inference in time series models. Herke van Hoof slides mostly by Michael Rubinstein

Residual migration in VTI media using anisotropy continuation

Decision Fusion With Unknown Sensor Detection Probability

arxiv: v1 [stat.ml] 15 Feb 2018

Generalization Propagator Method for DOA Estimation

Maximum Achievable Diversity for MIMO-OFDM Systems with Arbitrary. Spatial Correlation

Sequential Procedure for Testing Hypothesis about Mean of Latent Gaussian Process

International Journal of Scientific & Engineering Research, Volume 7, Issue 11, November-2016 ISSN Nullity of Expanded Smith graphs

Reversal in time order of interactive events: Collision of inclined rods

Econometrics II - EXAM Outline Solutions All questions have 25pts Answer each question in separate sheets

A014 Uncertainty Analysis of Velocity to Resistivity Transforms for Near Field Exploration

BAYESIAN PREMIUM AND ASYMPTOTIC APPROXIMATION FOR A CLASS OF LOSS FUNCTIONS

A matrix Method for Interval Hermite Curve Segmentation O. Ismail, Senior Member, IEEE

SIGNAL STRENGTH LOCALIZATION BOUNDS IN AD HOC & SENSOR NETWORKS WHEN TRANSMIT POWERS ARE RANDOM. Neal Patwari and Alfred O.

Math 425 Lecture 1: Vectors in R 3, R n

Parameter Estimation

Online Companion to Pricing Services Subject to Congestion: Charge Per-Use Fees or Sell Subscriptions?

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014

10. Composite Hypothesis Testing. ECE 830, Spring 2014

MINIMUM EXPECTED RISK PROBABILITY ESTIMATES FOR NONPARAMETRIC NEIGHBORHOOD CLASSIFIERS. Maya Gupta, Luca Cazzanti, and Santosh Srivastava

Distribution of the Ratio of Normal and Rice Random Variables

LTI Systems, Additive Noise, and Order Estimation

Estimation, Detection, and Identification CMU 18752

Collective circular motion of multi-vehicle systems with sensory limitations

SIGNAL STRENGTH LOCALIZATION BOUNDS IN AD HOC & SENSOR NETWORKS WHEN TRANSMIT POWERS ARE RANDOM. Neal Patwari and Alfred O.

Performance Analysis of an Adaptive Algorithm for DOA Estimation

Exploiting Source Redundancy to Improve the Rate of Polar Codes

Large Sample Properties of Estimators in the Classical Linear Regression Model

EXTENDED GLRT DETECTORS OF CORRELATION AND SPHERICITY: THE UNDERSAMPLED REGIME. Xavier Mestre 1, Pascal Vallet 2

Approximately achieving the feedback interference channel capacity with point-to-point codes

Detection theory. H 0 : x[n] = w[n]

Detection & Estimation Lecture 1

Dynamic Vehicle Routing with Heterogeneous Demands

A GENERAL CLASS OF LOWER BOUNDS ON THE PROBABILITY OF ERROR IN MULTIPLE HYPOTHESIS TESTING. Tirza Routtenberg and Joseph Tabrikian

A possible mechanism to explain wave-particle duality L D HOWE No current affiliation PACS Numbers: r, w, k

UNDERSTAND MOTION IN ONE AND TWO DIMENSIONS

Introduction to Statistical Inference

Outline. Bayesian Networks: Belief Propagation in Singly Connected Networks. Form of Evidence and Notation. Motivation

A spectral Turán theorem

On resilience of distributed routing in networks under cascade dynamics

Transcription:

Optimal Joint Detection and Estimation in Linear Models Jianshu Chen, Yue Zhao, Andrea Goldsmith, and H. Vincent Poor Abstract The problem of optimal joint detection and estimation in linear models with Gaussian noise is studied. A simple closed-form ression for the joint posterior distribution of the multiple hypotheses and the states is deried. The ression crystalizes the dependence of the optimal detector on the state estimates. The joint posterior distribution characterizes the beliefs soft information about the hypotheses and the alues of the states. Furthermore, it is a sufficient statistic for jointly detecting multiple hypotheses and estimating the states. The deeloped ressions gie us a unified framework for joint detection and estimation under all performance criteria. I. INTRODUCTION Detection and estimation problems appear simultaneously and are naturally coupled in many engineering systems. Seeral prominent examples are as follows. To achiee situational awareness in power grids, it is essential to hae timely detection of outages as well as estimation of system states ], ]. Radar systems detect the existence of targets and also estimate their positions and elocities 3]. Wireless communication systems often need to decode messages and estimate channel states at the same time 4]. In different engineering systems, the problem settings of joint detection and estimation can ary greatly, and many applicationspecific solutions hae been deeloped in practice. A classic approach that addresses the detection problem in the presence of unknown states/parameters is composite hypothesis testing 3]. Accordingly, a straightforward approach for joint hypothesis testing and state/parameter estimation is to perform composite hypothesis testing first, followed by state/parameter estimation based on the hard decision made from hypothesis testing. Howeer, such an approach cannot proide optimality guarantees under general performance criteria that depend jointly on detection and estimation results. In the literature, seeral studies hae addressed such joint performance criteria. The structure of the jointly optimal Bayes detector and estimator with discrete-time data was deeloped in 5] and 6], and was extended to the continuoustime data case in 7]. There, the detector structure was This research was supported in part by the DTRA under Grant HDTRA- 08--000, in part by the Air Force Office of Scientific Research under MURI Grant FA9550-09--0643, and in part by the Office of Naal Research under Grant N0004---0767. J. Chen is with the Dept. of Electrical Engineering, Uniersity of California, Los Angeles, CA 90095 USA. e-mail: jshchen@ee.ucla.edu Y. Zhao is with the Dept. of Electrical Engineering, Princeton Uniersity, Princeton, NJ, 08544 USA, and with the Dept. of Electrical Engineering, Stanford Uniersity, Stanford, CA, 94305 USA e-mail: yuez@princeton.edu. A. Goldsmith is with the Dept. of Electrical Engineering, Stanford Uniersity, Stanford, CA 94305 USA e-mail: andrea@stanford.edu H. V. Poor is with the Dept. of Electrical Engineering, Princeton Uniersity, Princeton, NJ 08544 USA e-mail: poor@princeton.edu ressed in terms of some generalized forms of likelihood ratios. The structure of the optimal Bayesian estimator under any gien constraints on false alarm probability and probability of missed detection hae also been deeloped for the binary hypothesis case 8]. In this paper, we study the problem of optimal joint detection and estimation for a general class of obseration models, namely, linear models with Gaussian noise. Linear models appear in a wide range of engineering applications including power systems ], channel estimation 9], 0], adaptie array processing ] 3], and spectrum estimation 4]. In these applications, not only is state estimation of primary interest, but also the obseration matrix can often change oer time, and it is essential to detect which obseration matrix among many possibilities is currently effectie. We formulate these problems as joint multiple hypothesis testing and state estimation problems. Instead of focusing on a particular form of performance criterion and deeloping the corresponding optimal joint detector and estimator, we deelop a unified Bayesian approach that can be applied to any gien criterion. Specifically, employing a conjugate prior, we proide closed form ressions for the joint posterior of the hypotheses and the system states gien all measurement samples. The deeloped ressions reeal the exact dependence of the optimal detectors on the state estimates. Because the joint posterior is a sufficient statistic for joint hypothesis testing and state estimation, the deried licit forms of such soft information as opposed to hard decisions can be applied to all performance criteria with optimality guarantees. The remainder of the paper is organized as follows. In Section II, we describe the system model and formulate the joint detection and estimation problem. In Section III, we proide a factorization of the likelihood function and derie a simple closed form ression for the joint posterior distribution. Finally, we conclude our paper and remark on future directions in Section IV. Notation. We use boldface letters to denote random quantities and use regular letters to denote realizations or deterministic quantities. II. PROBLEM FORMULATION We consider the following obseration model which entails a joint detection and estimation problem. Gien each of the K + hypotheses H 0, H,..., H K, the M sensor measurement ector x t at time t is obtained according to the following linear model: H k : x t H k θ + t, k 0,,..., K,

where H k is the M N obseration matrix under hypothesis H k, θ is the N unknown state ector to estimate, and t N 0, R is the M measurement noise that is independent and identically distributed i.i.d. oer time. From the measurement data x t, we want to jointly infer a the true underlying linear model H k, and b the true underlying states θ. Note that neither of them is known beforehand, and we need to sole a problem of jointly detecting H k and estimating θ. Such problems arise in many applications. We proide in the following an example that arises commonly in power grid monitoring. An outage in a power grid will change the grid topology, and the system operator wants to detect which outage among a candidate set H,..., H K occurs, or no outage occurs H 0. With a gien set of sensors in the grid, the k th outage scenario gies rises to a unique obseration matrix H k, and the sensors measure the states of the grid θ ia. Consequently, state estimation depends on knowledge of the true outage, and outage detection depends on knowledge of the true states ]. Clearly, soling a joint detection and estimation problem is essential for monitoring the health of the power grid in real time. For these purposes, this paper proides the joint posterior distribution pθ, H k x i see 3 below, which gies us the beliefs about both θ and H k. Actually, it is also a sufficient statistic for θ and H k gien data x i, which proides full information from the measured data x i about the hypothesis H k and the state ector θ. Therefore, instead of being the optimal functions of x i, the optimal decision rule and the estimator need only be the optimal functionals of pθ, H k x i. Deriing the ressions for the joint posterior distribution will gie us a unified framework for joint detection and estimation under all performance criteria e.g., minimum risk/minimum probability of error/maximum a- posteriori probability MAP detection, and MAP/minimummean-square-error MMSE estimate. III. JOINT POSTERIOR OF HYPOTHESES AND STATES We now derie the joint posterior distribution of the hypothesis H k and the unknown states θ. Specifically, we will use pθ, H k x i as a hybrid probability measure to denote the joint posterior distribution of θ and H k : pθ, H k x i ph k x i pθ H k, x i where ph k x i denotes the posterior probability mass function PMF of H k and pθ H k, x i denotes the posterior probability density function PDF of θ gien H k. A. The Likelihood Function We begin with a factorization of the likelihood function px i θ, H k, which will be useful in finding sufficient statistics for jointly detecting H k and estimating θ, and in computing the joint posterior distribution. In addition to states, θ can also include parameters in some applications ], 3]. For the sake of breity, we refer to θ as states from now on. Lemma Factorization: According to the linear model in, we can ress the conditional distribution the likelihood function px i θ, H k in the following form: px i θ, H k px i ˆθ k,ml, H k θ ˆθk,ML Iˆθ k,ml where the notation x Σ denotes xt Σx for some positie definite weighting matrix Σ, ˆθ k,ml is the maximum likelihood estimate of θ gien that hypothesis H k is true, and Iˆθ k,ml is the corresponding Fisher information matrix: ˆθ k,ml Hk T R H k Hk T R x i 4 Iˆθ k,ml i Hk T R H k 5 x i x t. 6 i Proof: See Appendix I. In 5], an asymptotic ression similar to 3 was deried for general likelihood functions satisfying certain regularity conditions for i large. In comparison, our ression 3 holds for all i due to the properties of the linear model with Gaussian noise that we hae assumed. Furthermore, the linear model also allows us to ealuate the ression for px i ˆθ k,ml, H k gien by the following lemma. Lemma Expression for px i ˆθ k,ml, H k : The conditional probability px i ˆθ k,ml, H k can be ressed as px i ˆθ k,ml, H k π M detr ˆθk,ML x t R Iˆθ k,ml where ˆθ k,ml and Iˆθ k,ml are gien by 4 6. Proof: See Appendix II. Substituting 7 into 3, we obtain the following factorization of the likelihood function: px i θ, H k π M detr ˆθ k,ml x t R Iˆθ k,ml θ ˆθ k,ml Iˆθ k,ml 3 7. 8 Note that the first two terms in 8 are independent of the hypothesis index k and the state ector θ, while the other two terms depend on ˆθ k,ml, which, by 4, is further determined by x i. Therefore, by the Neyman-Fisher factorization theorem 3], 5], 6], x i is a sufficient statistic

for jointly detecting H k and estimating θ. This fact will also be reflected further ahead in the joint posterior ressions 6, where x i is the only statistic we need to track oer time ia, e.g., B. Conjugate Prior x i x i + i x i x i. 9 For a certain likelihood function, if a prior distribution produces a posterior distribution of the same family, then such a prior distribution is called a conjugate prior. With a conjugate prior, we need only to maintain the recursions for the parameters that describe the distribution family of the prior and the posterior. We will use this kind of prior in our joint detection and estimation problem. At the beginning before any measurement data are aailable, we assume that the prior distribution of θ and H k are gien by pθ, H k ph k pθ H k 0 where ph k is the prior PMF of the hypothesis H k and pθ H k is the prior PDF of the state ector θ gien hypothesis H k. Throughout the paper, we assume that gien H k, θ has a Gaussian prior: pθ H k π N detc θ θ C where θ and C are the corresponding prior mean and coariance matrix gien hypothesis H k, respectiely. We will show in the next section that this prior is indeed a conjugate prior. Furthermore, we will also show that een with an uninformatie prior about θ i.e., C in some way, the resulting posterior will take the same form as the joint prior distribution in. Therefore, alternatiely, we can think of the conjugate prior in as the intermediate knowledge that we learned from earlier data. C. Main Results Theorem Optimal joint inference: Suppose the prior distribution is gien by 0. Then, the posterior distribution pθ, H k x i ph k x i pθ x i, H k is gien by ph k x i fx i ph detc k,mmse k detc ˆθ k,mmse C k,mmse, θ C pθ H k, x i π N detc k,mmse θ ˆθ k,mmse, 3 C k,mmse where fx i K detc q,mmse ph q detc q,0 q0 ˆθ q,mmse C θ q,0 C q,0, q,mmse 4 ˆθ k,mmse C +Iˆθ k,ml Iˆθ k,ml ˆθ k,ml +C θ, 5 and C k,mmse C + Iˆθ k,ml. 6 Proof: See Appendix III. Note that ˆθ k,mmse is the classical MMSE estimate of θ gien H k is true, and C k,mmse is the corresponding error coariance matrix. In the posterior ression, fx i is a normalization factor. ph k captures the prior PMF. detck,mmse detc The intuition of is to penalize the model complexity of H k. This term reduces to detiˆθ k,ml in the case of an uninformatie prior see 8 below, for which a discussion of its meaning can be found in 5]. The last term in characterizes the similarity between the data adjusted by the prior PDF and the hypothesis H k. Moreoer, the exact dependence of the optimal detector on the state estimator can be seen from ression. Accordingly, the posterior marginal distribution pθ x i for the state ector θ can be ressed as pθ x i K ph k x i π N detc k,mmse θ ˆθ k,mmse C k,mmse k0 7 which is a Gaussian mixture density with K + components. We obsere from 3 that the posterior distribution is in the same family as the prior distribution 0. Therefore, the prior we hae chosen is a conjugate prior. As a result, we need only to maintain recursions for the parameters of the posterior distribution. Specifically, we need only to maintain recursions for C k,mmse and ˆθ k,mmse, where ˆθ k,mmse is the only term that depends on data ia ˆθ k,ml. By 4, ˆθ k,ml is a linear function of x i, which, as we pointed out in Section III-A, is a sufficient statistic and can be updated recursiely by 9. Therefore, as new data stream in, the optimal inference oer the joint posterior pθ, H k x i can be implemented recursiely using finite memory. This fact is also reflected in the marginal distribution pθ x i, where the number of Gaussian mixture components remains at K + oer time. D. The Case without Prior Knowledge of States When the inference algorithm has just started, we may not hae any prior information about the states θ. In this case, we may assume that we are equally uninformed about the states under different hypotheses. In such a setup, we let the coariance matrices of the prior pθ H k be

the same for all H k, i.e., C C 0 for some inertible matrix, and let C 0. Applying this procedure to 5 6, the MMSE estimate ˆθ k,mmse gien H k becomes the maximum likelihood estimate ˆθ k,ml, and the MMSE C k,mmse gien H k becomes the inerse Fisher information matrix Iˆθ k,ml. As a result, the optimal joint inference is gien by the following corollary. Corollary Uninformatie Prior : Without any prior information about the states, the joint posterior distribution is gien by ˆθk Iˆθ k,ml ph k det Iˆθ k,ml ph k x i K ˆθq,ML ph q det Iˆθ q,ml q0 Iˆθ q,ml, 8 pθ H k, x i π M detiˆθ k,ml θ ˆθ k. 9 Iˆθ k,ml As a consequence, the posterior marginal distribution pθ x i for the state ector θ can be ressed as pθ x i K ph k x i π M detiˆθ k0 k,ml θ ˆθ k. 0 Iˆθ k,ml Note from 8 9 that the joint posterior distribution pθ, H k x i takes the same form as the joint posterior in 3 een when we start without any prior knowledge about θ. Therefore, the form of the joint prior distribution in 0 can also be iewed as the knowledge we learned from earlier data since they take the same form as the one in 8 9. IV. CONCLUSIONS AND FUTURE WORK In this paper, we hae studied the optimal joint detection and estimation problem in linear models with Gaussian noise. We hae proed a factorization lemma of the likelihood function, and showed that the aerage measurement data ector x i is a sufficient statistic. We then hae deried a simple closed form ression of the joint posterior distribution for the hypotheses and the states. This ression reeals the exact dependence of the optimal detector on the state estimates. The joint posterior can then be used to deelop optimal joint detector and estimator structures under any gien performance criterion. We hae studied the case in which the states are static oer time. It is interesting to generalize to the case in which the states eole according to certain dynamics. In particular, we would like to inestigate whether the joint posterior follows similar forms that only depend on the state estimates oer time. For this, it is essential to examine the eolution of the joint posterior from time i to i, and we ect that similar techniques will apply. Furthermore, we hae focused on the optimal fixed sample size approach. Deeloping an optimal sequential approach for joint detection and estimation from the joint posterior distribution remains an interesting direction for future work. APPENDIX I PROOF OF LEMMA From the definition of Gaussian linear model and because of the i.i.d. assumption on the noise, we can write the joint likelihood function as px i θ, H k i px t θ, H k i π M detr x t H k θ R x π M detr t H k θ R π M detr x t H k ˆθk,ML +H k ˆθ k,ml θ R π M detr x t H k ˆθk,ML T R x t H k ˆθk,ML ˆθ k,ml θ T H T k R x t H k ˆθk,ML T R H k ˆθ k,ml θ H k ˆθ k,ml θ π M detr x t H k ˆθk,ML T R x t H k ˆθk,ML ˆθ k,ml θ T Hk T R H k ˆθ k,ml θ ix i H k ˆθk,ML T R H k ˆθ k,ml θ π M detr x t H k ˆθk,ML T R x t H k ˆθk,ML

ˆθ k,ml θ T Hk T R H k ˆθ k,ml θ i x T i R H k ˆθ k,ml θ ] ˆθ k,mlh T k T R H k ˆθ k,ml θ π M detr x t H k ˆθk,ML T R x t H k ˆθk,ML ˆθ k,ml θ T Hk T R H k ˆθ k,ml θ i x T i R H k Hk T R H k ˆθ k,ml T Hk T R H k ˆθ k,ml θ ˆθ ] k,mlh T k T R H k ˆθ k,ml θ π M detr x t H k ˆθk,ML T R x t H k ˆθk,ML ˆθ k,ml θ T i Hk T R H k ˆθ k,ml θ px ˆθ k,ml, H k ˆθ k,ml θ T Iˆθ k,ml ˆθ k,ml θ. APPENDIX II PROOF OF LEMMA By, the ression for px ˆθ k,ml, H k is gien by px i ˆθ k,ml, H k π M detr x t H k ˆθk,ML T R x t H k ˆθk,ML For the summation in the onent, we hae x t H k ˆθk,ML T R x t H k ˆθk,ML x T t R x t x T t R H k ˆθk,ML + ˆθ T k,mlh T k R H k ˆθk,ML ] x T t R x t ix T i R H k ˆθk,ML + i ˆθ k,mlh T k T R H k ˆθk,ML x T t R x t ˆθ k,mliˆθ T k,ml ˆθ k,ml + ˆθ T k,mliˆθ k,ml ˆθ k,ml x T t R x t ˆθ T k,mliˆθ k,ml ˆθ k,ml. 3 Finally, substituting 3 into, we establish Lemma. APPENDIX III PROOF OF THEOREM By Bayes formula, the joint posterior distribution can be ressed as pθ, H k x i pθ, H kpx i θ, H k px i. 4 To compute the aboe posterior distribution, we need to hae px i gien by px i K k0 ph k pθpx i θ, H k. 5 To proceed, we first introduce the following lemma that gies an integral result that would be useful in deriing both the optimal detection and estimation procedures. Lemma 3 A useful integral: Suppose we are gien the Gaussian prior distribution. Then, the following result holds: pθ θ ˆθ k,ml Iˆθ k,ml ˆθ k,ml +C Iˆθ k,ml 6 θ C +Iˆθ k,ml θ C + ˆθ k,ml Iˆθ k,ml detc det C + Iˆθ k,ml 7 where ˆθ k,ml and Iˆθ k,ml are defined in 4 6. Proof: See Appendix IV. Second, we compute the following integral and px i using the aboe lemma: pθpx i θ, H k π M detr ˆθ k,ml Iˆθ k,ml pθ θ ˆθk,ML Iˆθ k,ml π M detr x t R x t R

ˆθk,ML Iˆθk,ML Iˆθ k,ml ˆθ k,ml +C θ C +Iˆθ k,ml θ C + ˆθ k,ml Iˆθ k,ml detc det C + Iˆθ k,ml px i P ph k pθpx i θ, H k p0 P ph k π M detr p0 x t R Iˆθ k,ml ˆθ k,ml +C ˆθk,ML Iˆθ k,ml 8 θ C +Iˆθ k,ml θ C + ˆθ k,ml Iˆθ k,ml detc det C + Iˆθ k,ml. 9 Substituting the aboe ressions and 8 into 4, we obtain 6 after some simple algebra. It follows that pθ π N detc APPENDIX IV PROOF OF LEMMA 3 θ ˆθ k,ml θ θ C θ ˆθ k,ml Iˆθ k,ml Iˆθ k,ml ˆθ k,ml +C Iˆθ k,ml θ C +Iˆθ k,ml θ C + ˆθ k,ml Iˆθ k,ml detc det C + Iˆθ k,ml π N det C + Iˆθ k,ml θ C + Iˆθ k,ml Iˆθ k,ml ˆθ k,ml + C θ C +Iˆθ k,ml Iˆθ k,ml ˆθ k,ml +C θ C +Iˆθ k,ml θ C + ˆθ k,ml Iˆθ k,ml detc det C + Iˆθ k,ml 30 where the notation x Σ xt Σx, and in the last step we used the fact that the integral of a Gaussian distribution oer the entire space equals one. REFERENCES ] A. Abur and A. Gomez-Exposito, Power System State Estimation: Theory and Implementation, Marcel Dekker, New York, 004. ] Y. Zhao, A. Goldsmith, and H. V. Poor, On PMU location selection for line outage detection in wide-area transmission networks, in Proc. IEEE Power and Energy Society General Meeting, San Diego, CA, July 0, pp. 8. 3] H. V. Poor, An Introduction to Signal Detection and Estimation, Springer-Verlag, New York, 994. 4] A. Goldsmith, Wireless Communications, Cambridge Uniersity Press, Cambridge, UK, 005. 5] D. Middleton and R. Esposito, Simultaneous optimum detection and estimation of signals in noise, IEEE Trans. Inf. Theory, ol. 4, no. 3, pp. 434 444, May 968. 6] A. Fredriksen, D. Middleton, and V. VandeLinde, Simultaneous signal detection and estimation under multiple hypotheses, IEEE Trans. Inf. Theory, ol. 8, no. 5, pp. 607 64, Sep. 97. 7] D. G. Lainiotis, Joint detection, estimation and system identification, Information and Control, ol. 9, no., pp. 75 9, 97. 8] G. V. Moustakides, G. H. Jajamoich, A. Tajer, and X. Wang, Joint detection and estimation: Optimum tests and applications, IEEE Trans. Inf. Theory, ol. 58, no. 7, pp. 45 49, 0. 9] O. Edfors, M. Sandell, J.-J. Van de Beek, S. K. Wilson, and P. O. Borjesson, OFDM channel estimation by singular alue decomposition, IEEE Trans. Commun., ol. 46, no. 7, pp. 93 939, Jul. 998. 0] Y. Li, L. J. Cimini Jr, and N. R. Sollenberger, Robust channel estimation for OFDM systems with rapid dispersie fading channels, IEEE Trans. Commun., ol. 46, no. 7, pp. 90 95, Jul. 998. ] H. L. Van Trees, Optimum Array Processing, John Wiley & Sons, New York, 00. ] R. Schmidt, Multiple emitter location and signal parameter estimation, IEEE Trans. Antennas Propag., ol. 34, no. 3, pp. 76 80, Mar. 986. 3] R. Roy and T. Kailath, ESPRIT-estimation of signal parameters ia rotational inariance techniques, IEEE Trans. Acoust., Speech, Signal Process., ol. 37, no. 7, pp. 984 995, Jul 989. 4] S. M. Kay, Modern Spectral Estimation: Theory and Application, Prentice Hall, Englewood Cliffs, NJ, 988. 5] S. M Kay, Fundamentals of Statistical Signal Processing, Volume II: Detection theory, Prentice Hall, Upper Saddle Rier, NJ, 998. 6] S. M Kay, Fundamentals of Statistical Signal Processing, Volume I: Estimation Theory, Prentice Hall, Upper Saddle Rier, NJ, 993.