Optimal Joint Detection and Estimation in Linear Models Jianshu Chen, Yue Zhao, Andrea Goldsmith, and H. Vincent Poor Abstract The problem of optimal joint detection and estimation in linear models with Gaussian noise is studied. A simple closed-form ression for the joint posterior distribution of the multiple hypotheses and the states is deried. The ression crystalizes the dependence of the optimal detector on the state estimates. The joint posterior distribution characterizes the beliefs soft information about the hypotheses and the alues of the states. Furthermore, it is a sufficient statistic for jointly detecting multiple hypotheses and estimating the states. The deeloped ressions gie us a unified framework for joint detection and estimation under all performance criteria. I. INTRODUCTION Detection and estimation problems appear simultaneously and are naturally coupled in many engineering systems. Seeral prominent examples are as follows. To achiee situational awareness in power grids, it is essential to hae timely detection of outages as well as estimation of system states ], ]. Radar systems detect the existence of targets and also estimate their positions and elocities 3]. Wireless communication systems often need to decode messages and estimate channel states at the same time 4]. In different engineering systems, the problem settings of joint detection and estimation can ary greatly, and many applicationspecific solutions hae been deeloped in practice. A classic approach that addresses the detection problem in the presence of unknown states/parameters is composite hypothesis testing 3]. Accordingly, a straightforward approach for joint hypothesis testing and state/parameter estimation is to perform composite hypothesis testing first, followed by state/parameter estimation based on the hard decision made from hypothesis testing. Howeer, such an approach cannot proide optimality guarantees under general performance criteria that depend jointly on detection and estimation results. In the literature, seeral studies hae addressed such joint performance criteria. The structure of the jointly optimal Bayes detector and estimator with discrete-time data was deeloped in 5] and 6], and was extended to the continuoustime data case in 7]. There, the detector structure was This research was supported in part by the DTRA under Grant HDTRA- 08--000, in part by the Air Force Office of Scientific Research under MURI Grant FA9550-09--0643, and in part by the Office of Naal Research under Grant N0004---0767. J. Chen is with the Dept. of Electrical Engineering, Uniersity of California, Los Angeles, CA 90095 USA. e-mail: jshchen@ee.ucla.edu Y. Zhao is with the Dept. of Electrical Engineering, Princeton Uniersity, Princeton, NJ, 08544 USA, and with the Dept. of Electrical Engineering, Stanford Uniersity, Stanford, CA, 94305 USA e-mail: yuez@princeton.edu. A. Goldsmith is with the Dept. of Electrical Engineering, Stanford Uniersity, Stanford, CA 94305 USA e-mail: andrea@stanford.edu H. V. Poor is with the Dept. of Electrical Engineering, Princeton Uniersity, Princeton, NJ 08544 USA e-mail: poor@princeton.edu ressed in terms of some generalized forms of likelihood ratios. The structure of the optimal Bayesian estimator under any gien constraints on false alarm probability and probability of missed detection hae also been deeloped for the binary hypothesis case 8]. In this paper, we study the problem of optimal joint detection and estimation for a general class of obseration models, namely, linear models with Gaussian noise. Linear models appear in a wide range of engineering applications including power systems ], channel estimation 9], 0], adaptie array processing ] 3], and spectrum estimation 4]. In these applications, not only is state estimation of primary interest, but also the obseration matrix can often change oer time, and it is essential to detect which obseration matrix among many possibilities is currently effectie. We formulate these problems as joint multiple hypothesis testing and state estimation problems. Instead of focusing on a particular form of performance criterion and deeloping the corresponding optimal joint detector and estimator, we deelop a unified Bayesian approach that can be applied to any gien criterion. Specifically, employing a conjugate prior, we proide closed form ressions for the joint posterior of the hypotheses and the system states gien all measurement samples. The deeloped ressions reeal the exact dependence of the optimal detectors on the state estimates. Because the joint posterior is a sufficient statistic for joint hypothesis testing and state estimation, the deried licit forms of such soft information as opposed to hard decisions can be applied to all performance criteria with optimality guarantees. The remainder of the paper is organized as follows. In Section II, we describe the system model and formulate the joint detection and estimation problem. In Section III, we proide a factorization of the likelihood function and derie a simple closed form ression for the joint posterior distribution. Finally, we conclude our paper and remark on future directions in Section IV. Notation. We use boldface letters to denote random quantities and use regular letters to denote realizations or deterministic quantities. II. PROBLEM FORMULATION We consider the following obseration model which entails a joint detection and estimation problem. Gien each of the K + hypotheses H 0, H,..., H K, the M sensor measurement ector x t at time t is obtained according to the following linear model: H k : x t H k θ + t, k 0,,..., K,
where H k is the M N obseration matrix under hypothesis H k, θ is the N unknown state ector to estimate, and t N 0, R is the M measurement noise that is independent and identically distributed i.i.d. oer time. From the measurement data x t, we want to jointly infer a the true underlying linear model H k, and b the true underlying states θ. Note that neither of them is known beforehand, and we need to sole a problem of jointly detecting H k and estimating θ. Such problems arise in many applications. We proide in the following an example that arises commonly in power grid monitoring. An outage in a power grid will change the grid topology, and the system operator wants to detect which outage among a candidate set H,..., H K occurs, or no outage occurs H 0. With a gien set of sensors in the grid, the k th outage scenario gies rises to a unique obseration matrix H k, and the sensors measure the states of the grid θ ia. Consequently, state estimation depends on knowledge of the true outage, and outage detection depends on knowledge of the true states ]. Clearly, soling a joint detection and estimation problem is essential for monitoring the health of the power grid in real time. For these purposes, this paper proides the joint posterior distribution pθ, H k x i see 3 below, which gies us the beliefs about both θ and H k. Actually, it is also a sufficient statistic for θ and H k gien data x i, which proides full information from the measured data x i about the hypothesis H k and the state ector θ. Therefore, instead of being the optimal functions of x i, the optimal decision rule and the estimator need only be the optimal functionals of pθ, H k x i. Deriing the ressions for the joint posterior distribution will gie us a unified framework for joint detection and estimation under all performance criteria e.g., minimum risk/minimum probability of error/maximum a- posteriori probability MAP detection, and MAP/minimummean-square-error MMSE estimate. III. JOINT POSTERIOR OF HYPOTHESES AND STATES We now derie the joint posterior distribution of the hypothesis H k and the unknown states θ. Specifically, we will use pθ, H k x i as a hybrid probability measure to denote the joint posterior distribution of θ and H k : pθ, H k x i ph k x i pθ H k, x i where ph k x i denotes the posterior probability mass function PMF of H k and pθ H k, x i denotes the posterior probability density function PDF of θ gien H k. A. The Likelihood Function We begin with a factorization of the likelihood function px i θ, H k, which will be useful in finding sufficient statistics for jointly detecting H k and estimating θ, and in computing the joint posterior distribution. In addition to states, θ can also include parameters in some applications ], 3]. For the sake of breity, we refer to θ as states from now on. Lemma Factorization: According to the linear model in, we can ress the conditional distribution the likelihood function px i θ, H k in the following form: px i θ, H k px i ˆθ k,ml, H k θ ˆθk,ML Iˆθ k,ml where the notation x Σ denotes xt Σx for some positie definite weighting matrix Σ, ˆθ k,ml is the maximum likelihood estimate of θ gien that hypothesis H k is true, and Iˆθ k,ml is the corresponding Fisher information matrix: ˆθ k,ml Hk T R H k Hk T R x i 4 Iˆθ k,ml i Hk T R H k 5 x i x t. 6 i Proof: See Appendix I. In 5], an asymptotic ression similar to 3 was deried for general likelihood functions satisfying certain regularity conditions for i large. In comparison, our ression 3 holds for all i due to the properties of the linear model with Gaussian noise that we hae assumed. Furthermore, the linear model also allows us to ealuate the ression for px i ˆθ k,ml, H k gien by the following lemma. Lemma Expression for px i ˆθ k,ml, H k : The conditional probability px i ˆθ k,ml, H k can be ressed as px i ˆθ k,ml, H k π M detr ˆθk,ML x t R Iˆθ k,ml where ˆθ k,ml and Iˆθ k,ml are gien by 4 6. Proof: See Appendix II. Substituting 7 into 3, we obtain the following factorization of the likelihood function: px i θ, H k π M detr ˆθ k,ml x t R Iˆθ k,ml θ ˆθ k,ml Iˆθ k,ml 3 7. 8 Note that the first two terms in 8 are independent of the hypothesis index k and the state ector θ, while the other two terms depend on ˆθ k,ml, which, by 4, is further determined by x i. Therefore, by the Neyman-Fisher factorization theorem 3], 5], 6], x i is a sufficient statistic
for jointly detecting H k and estimating θ. This fact will also be reflected further ahead in the joint posterior ressions 6, where x i is the only statistic we need to track oer time ia, e.g., B. Conjugate Prior x i x i + i x i x i. 9 For a certain likelihood function, if a prior distribution produces a posterior distribution of the same family, then such a prior distribution is called a conjugate prior. With a conjugate prior, we need only to maintain the recursions for the parameters that describe the distribution family of the prior and the posterior. We will use this kind of prior in our joint detection and estimation problem. At the beginning before any measurement data are aailable, we assume that the prior distribution of θ and H k are gien by pθ, H k ph k pθ H k 0 where ph k is the prior PMF of the hypothesis H k and pθ H k is the prior PDF of the state ector θ gien hypothesis H k. Throughout the paper, we assume that gien H k, θ has a Gaussian prior: pθ H k π N detc θ θ C where θ and C are the corresponding prior mean and coariance matrix gien hypothesis H k, respectiely. We will show in the next section that this prior is indeed a conjugate prior. Furthermore, we will also show that een with an uninformatie prior about θ i.e., C in some way, the resulting posterior will take the same form as the joint prior distribution in. Therefore, alternatiely, we can think of the conjugate prior in as the intermediate knowledge that we learned from earlier data. C. Main Results Theorem Optimal joint inference: Suppose the prior distribution is gien by 0. Then, the posterior distribution pθ, H k x i ph k x i pθ x i, H k is gien by ph k x i fx i ph detc k,mmse k detc ˆθ k,mmse C k,mmse, θ C pθ H k, x i π N detc k,mmse θ ˆθ k,mmse, 3 C k,mmse where fx i K detc q,mmse ph q detc q,0 q0 ˆθ q,mmse C θ q,0 C q,0, q,mmse 4 ˆθ k,mmse C +Iˆθ k,ml Iˆθ k,ml ˆθ k,ml +C θ, 5 and C k,mmse C + Iˆθ k,ml. 6 Proof: See Appendix III. Note that ˆθ k,mmse is the classical MMSE estimate of θ gien H k is true, and C k,mmse is the corresponding error coariance matrix. In the posterior ression, fx i is a normalization factor. ph k captures the prior PMF. detck,mmse detc The intuition of is to penalize the model complexity of H k. This term reduces to detiˆθ k,ml in the case of an uninformatie prior see 8 below, for which a discussion of its meaning can be found in 5]. The last term in characterizes the similarity between the data adjusted by the prior PDF and the hypothesis H k. Moreoer, the exact dependence of the optimal detector on the state estimator can be seen from ression. Accordingly, the posterior marginal distribution pθ x i for the state ector θ can be ressed as pθ x i K ph k x i π N detc k,mmse θ ˆθ k,mmse C k,mmse k0 7 which is a Gaussian mixture density with K + components. We obsere from 3 that the posterior distribution is in the same family as the prior distribution 0. Therefore, the prior we hae chosen is a conjugate prior. As a result, we need only to maintain recursions for the parameters of the posterior distribution. Specifically, we need only to maintain recursions for C k,mmse and ˆθ k,mmse, where ˆθ k,mmse is the only term that depends on data ia ˆθ k,ml. By 4, ˆθ k,ml is a linear function of x i, which, as we pointed out in Section III-A, is a sufficient statistic and can be updated recursiely by 9. Therefore, as new data stream in, the optimal inference oer the joint posterior pθ, H k x i can be implemented recursiely using finite memory. This fact is also reflected in the marginal distribution pθ x i, where the number of Gaussian mixture components remains at K + oer time. D. The Case without Prior Knowledge of States When the inference algorithm has just started, we may not hae any prior information about the states θ. In this case, we may assume that we are equally uninformed about the states under different hypotheses. In such a setup, we let the coariance matrices of the prior pθ H k be
the same for all H k, i.e., C C 0 for some inertible matrix, and let C 0. Applying this procedure to 5 6, the MMSE estimate ˆθ k,mmse gien H k becomes the maximum likelihood estimate ˆθ k,ml, and the MMSE C k,mmse gien H k becomes the inerse Fisher information matrix Iˆθ k,ml. As a result, the optimal joint inference is gien by the following corollary. Corollary Uninformatie Prior : Without any prior information about the states, the joint posterior distribution is gien by ˆθk Iˆθ k,ml ph k det Iˆθ k,ml ph k x i K ˆθq,ML ph q det Iˆθ q,ml q0 Iˆθ q,ml, 8 pθ H k, x i π M detiˆθ k,ml θ ˆθ k. 9 Iˆθ k,ml As a consequence, the posterior marginal distribution pθ x i for the state ector θ can be ressed as pθ x i K ph k x i π M detiˆθ k0 k,ml θ ˆθ k. 0 Iˆθ k,ml Note from 8 9 that the joint posterior distribution pθ, H k x i takes the same form as the joint posterior in 3 een when we start without any prior knowledge about θ. Therefore, the form of the joint prior distribution in 0 can also be iewed as the knowledge we learned from earlier data since they take the same form as the one in 8 9. IV. CONCLUSIONS AND FUTURE WORK In this paper, we hae studied the optimal joint detection and estimation problem in linear models with Gaussian noise. We hae proed a factorization lemma of the likelihood function, and showed that the aerage measurement data ector x i is a sufficient statistic. We then hae deried a simple closed form ression of the joint posterior distribution for the hypotheses and the states. This ression reeals the exact dependence of the optimal detector on the state estimates. The joint posterior can then be used to deelop optimal joint detector and estimator structures under any gien performance criterion. We hae studied the case in which the states are static oer time. It is interesting to generalize to the case in which the states eole according to certain dynamics. In particular, we would like to inestigate whether the joint posterior follows similar forms that only depend on the state estimates oer time. For this, it is essential to examine the eolution of the joint posterior from time i to i, and we ect that similar techniques will apply. Furthermore, we hae focused on the optimal fixed sample size approach. Deeloping an optimal sequential approach for joint detection and estimation from the joint posterior distribution remains an interesting direction for future work. APPENDIX I PROOF OF LEMMA From the definition of Gaussian linear model and because of the i.i.d. assumption on the noise, we can write the joint likelihood function as px i θ, H k i px t θ, H k i π M detr x t H k θ R x π M detr t H k θ R π M detr x t H k ˆθk,ML +H k ˆθ k,ml θ R π M detr x t H k ˆθk,ML T R x t H k ˆθk,ML ˆθ k,ml θ T H T k R x t H k ˆθk,ML T R H k ˆθ k,ml θ H k ˆθ k,ml θ π M detr x t H k ˆθk,ML T R x t H k ˆθk,ML ˆθ k,ml θ T Hk T R H k ˆθ k,ml θ ix i H k ˆθk,ML T R H k ˆθ k,ml θ π M detr x t H k ˆθk,ML T R x t H k ˆθk,ML
ˆθ k,ml θ T Hk T R H k ˆθ k,ml θ i x T i R H k ˆθ k,ml θ ] ˆθ k,mlh T k T R H k ˆθ k,ml θ π M detr x t H k ˆθk,ML T R x t H k ˆθk,ML ˆθ k,ml θ T Hk T R H k ˆθ k,ml θ i x T i R H k Hk T R H k ˆθ k,ml T Hk T R H k ˆθ k,ml θ ˆθ ] k,mlh T k T R H k ˆθ k,ml θ π M detr x t H k ˆθk,ML T R x t H k ˆθk,ML ˆθ k,ml θ T i Hk T R H k ˆθ k,ml θ px ˆθ k,ml, H k ˆθ k,ml θ T Iˆθ k,ml ˆθ k,ml θ. APPENDIX II PROOF OF LEMMA By, the ression for px ˆθ k,ml, H k is gien by px i ˆθ k,ml, H k π M detr x t H k ˆθk,ML T R x t H k ˆθk,ML For the summation in the onent, we hae x t H k ˆθk,ML T R x t H k ˆθk,ML x T t R x t x T t R H k ˆθk,ML + ˆθ T k,mlh T k R H k ˆθk,ML ] x T t R x t ix T i R H k ˆθk,ML + i ˆθ k,mlh T k T R H k ˆθk,ML x T t R x t ˆθ k,mliˆθ T k,ml ˆθ k,ml + ˆθ T k,mliˆθ k,ml ˆθ k,ml x T t R x t ˆθ T k,mliˆθ k,ml ˆθ k,ml. 3 Finally, substituting 3 into, we establish Lemma. APPENDIX III PROOF OF THEOREM By Bayes formula, the joint posterior distribution can be ressed as pθ, H k x i pθ, H kpx i θ, H k px i. 4 To compute the aboe posterior distribution, we need to hae px i gien by px i K k0 ph k pθpx i θ, H k. 5 To proceed, we first introduce the following lemma that gies an integral result that would be useful in deriing both the optimal detection and estimation procedures. Lemma 3 A useful integral: Suppose we are gien the Gaussian prior distribution. Then, the following result holds: pθ θ ˆθ k,ml Iˆθ k,ml ˆθ k,ml +C Iˆθ k,ml 6 θ C +Iˆθ k,ml θ C + ˆθ k,ml Iˆθ k,ml detc det C + Iˆθ k,ml 7 where ˆθ k,ml and Iˆθ k,ml are defined in 4 6. Proof: See Appendix IV. Second, we compute the following integral and px i using the aboe lemma: pθpx i θ, H k π M detr ˆθ k,ml Iˆθ k,ml pθ θ ˆθk,ML Iˆθ k,ml π M detr x t R x t R
ˆθk,ML Iˆθk,ML Iˆθ k,ml ˆθ k,ml +C θ C +Iˆθ k,ml θ C + ˆθ k,ml Iˆθ k,ml detc det C + Iˆθ k,ml px i P ph k pθpx i θ, H k p0 P ph k π M detr p0 x t R Iˆθ k,ml ˆθ k,ml +C ˆθk,ML Iˆθ k,ml 8 θ C +Iˆθ k,ml θ C + ˆθ k,ml Iˆθ k,ml detc det C + Iˆθ k,ml. 9 Substituting the aboe ressions and 8 into 4, we obtain 6 after some simple algebra. It follows that pθ π N detc APPENDIX IV PROOF OF LEMMA 3 θ ˆθ k,ml θ θ C θ ˆθ k,ml Iˆθ k,ml Iˆθ k,ml ˆθ k,ml +C Iˆθ k,ml θ C +Iˆθ k,ml θ C + ˆθ k,ml Iˆθ k,ml detc det C + Iˆθ k,ml π N det C + Iˆθ k,ml θ C + Iˆθ k,ml Iˆθ k,ml ˆθ k,ml + C θ C +Iˆθ k,ml Iˆθ k,ml ˆθ k,ml +C θ C +Iˆθ k,ml θ C + ˆθ k,ml Iˆθ k,ml detc det C + Iˆθ k,ml 30 where the notation x Σ xt Σx, and in the last step we used the fact that the integral of a Gaussian distribution oer the entire space equals one. REFERENCES ] A. Abur and A. Gomez-Exposito, Power System State Estimation: Theory and Implementation, Marcel Dekker, New York, 004. ] Y. Zhao, A. Goldsmith, and H. V. Poor, On PMU location selection for line outage detection in wide-area transmission networks, in Proc. IEEE Power and Energy Society General Meeting, San Diego, CA, July 0, pp. 8. 3] H. V. Poor, An Introduction to Signal Detection and Estimation, Springer-Verlag, New York, 994. 4] A. Goldsmith, Wireless Communications, Cambridge Uniersity Press, Cambridge, UK, 005. 5] D. Middleton and R. Esposito, Simultaneous optimum detection and estimation of signals in noise, IEEE Trans. Inf. Theory, ol. 4, no. 3, pp. 434 444, May 968. 6] A. Fredriksen, D. Middleton, and V. VandeLinde, Simultaneous signal detection and estimation under multiple hypotheses, IEEE Trans. Inf. Theory, ol. 8, no. 5, pp. 607 64, Sep. 97. 7] D. G. Lainiotis, Joint detection, estimation and system identification, Information and Control, ol. 9, no., pp. 75 9, 97. 8] G. V. Moustakides, G. H. Jajamoich, A. Tajer, and X. Wang, Joint detection and estimation: Optimum tests and applications, IEEE Trans. Inf. Theory, ol. 58, no. 7, pp. 45 49, 0. 9] O. Edfors, M. Sandell, J.-J. Van de Beek, S. K. Wilson, and P. O. Borjesson, OFDM channel estimation by singular alue decomposition, IEEE Trans. Commun., ol. 46, no. 7, pp. 93 939, Jul. 998. 0] Y. Li, L. J. Cimini Jr, and N. R. Sollenberger, Robust channel estimation for OFDM systems with rapid dispersie fading channels, IEEE Trans. Commun., ol. 46, no. 7, pp. 90 95, Jul. 998. ] H. L. Van Trees, Optimum Array Processing, John Wiley & Sons, New York, 00. ] R. Schmidt, Multiple emitter location and signal parameter estimation, IEEE Trans. Antennas Propag., ol. 34, no. 3, pp. 76 80, Mar. 986. 3] R. Roy and T. Kailath, ESPRIT-estimation of signal parameters ia rotational inariance techniques, IEEE Trans. Acoust., Speech, Signal Process., ol. 37, no. 7, pp. 984 995, Jul 989. 4] S. M. Kay, Modern Spectral Estimation: Theory and Application, Prentice Hall, Englewood Cliffs, NJ, 988. 5] S. M Kay, Fundamentals of Statistical Signal Processing, Volume II: Detection theory, Prentice Hall, Upper Saddle Rier, NJ, 998. 6] S. M Kay, Fundamentals of Statistical Signal Processing, Volume I: Estimation Theory, Prentice Hall, Upper Saddle Rier, NJ, 993.