Optimum Joint Detection and Estimation

Size: px

Start display at page:

Download "Optimum Joint Detection and Estimation"

Ashley Fox
5 years ago
Views:

1 Report SSP : Optimum Joint Detection and Estimation George V. Moustakides Statistical Signal Processing Group Department of Electrical & Computer Engineering niversity of Patras, GREECE

3 Contents 1 Joint Hypothesis Testing and Isolation Introduction Randomized Decision Rules and Classical Hypothesis Testing Neyman-Pearson Binary Hypothesis Testing Bayesian Multiple Hypothesis Testing Combined Hypothesis Testing and Isolation Optimality of GLRT Combined Neyman-Pearson and Bayesian Hypothesis Testing 8 2 Joint Hypothesis Testing and Estimation Introduction Optimum Bayesian Estimation Combined Neyman-Pearson Hypothesis Testing and Bayesian Estimation Variations Known Parameters under Conditional Cost Examples MAP Detection/Estimation MMSE Detection/Estimation Median Detection/Estimation Conclusion Acknowledgment 17 i

4 ii

5 1 Joint Hypothesis Testing and Isolation 1.1 Introduction In binary hypothesis testing, when hypotheses are composite or the corresponding data pdfs contain unknown parameters, one can use the well known generalized likelihood ratio test (GLRT) to reach a decision. This test has the very desirable characteristic of performing simultaneous detection and estimation in the case of parameterized pdfs or combined detection and isolation in the case of composite hypotheses. Although GLRT is known for many years and has been the decision tool in numerous applications, only asymptotic optimality results are currently available to support it. In this work we introduce a novel, finite sample size, detection/estimation formulation for the problem of hypothesis testing with unknown parameters and a corresponding detection/isolation setup for the case of composite hypotheses. The resulting optimum scheme has a GLRT-like form which is closely related to the criterion we employ for the parameter estimation or isolation part. When this criterion is selected in a very specific way we recover the well known GLRT of the literature while we obtain interesting novel tests with alternative criteria. Our mathematical derivations are surprisingly simple considering they solve a problem that has been open for more than half a century. Consider a random data vector X R N and two composite hypotheses, H 1 defined as H i : X f ik (X) with prior probability π ik, k = 1,..., K i, i = 0, 1, (1.1) where f ik (X) are pdf functions and means distributed according to. nder each hypothesis H i the data pdf can take one out of the K i possible forms f i1 (X),..., f iki (X) with corresponding prior probabilities π i1,..., π iki. The classical approach for distinguishing between the two composite hypotheses consists in forming, for each hypothesis, the mixture pdf K i f i (X) = π ik f ik (X), (1.2) and then, for any realization X of the random vector X, applying the likelihood ratio test k=1 K1 f 1 (X) f 0 (X) = k=1 π H 1kf 1k (X) 1 K0 k=1 π λ, (1.3) 0kf 0k (X) to make a decision. According to (1.3) we decide in favor of H 1 when the likelihood ratio exceeds the threshold λ; in favor of when the likelihood ratio falls below the threshold and perform a randomized decision between the two possibilities every time the likelihood ratio coincides with the threshold. Even though this decision scheme is optimum (in more than one senses), it can decide only between the two main hypotheses. There are clearly applications where one is interested in specifying the actual pdf that generates the data vector X. In other words in addition to the main hypothesis we could also attempt to fine-tune our decision mechanism by isolating the pdf that is responsible for the observed data X. This goal clearly demands for a joined detection/isolation strategy. A possible approach for solving the combined problem is with the help of GLRT, that 1

6 2 1: Joint Hypothesis Testing and Isolation is, by applying the following test which is equivalent to max f 1k (X) H 1 1 k K 1 λ, (1.4) max f 0k (X) 1 k K 0 H f 1ˆk1 (X) 1 λ f 0ˆk0 (X) (1.5) ˆk i = arg max f ik (X), i = 0, 1. 1 k K i (1.6) We observe that GLRT performs two simultaneous decisions: with (1.5) it decides between the two main hypotheses, H 1 and at the same time, with (1.6) it isolates the most likely pdf under each hypothesis. A significantly more interesting situation arises when under each hypothesis we have parameterized pdfs. Suppose that under hypothesis H i, i = 0, 1 the data vector satisfies X f i (X θ i ) where for the parameter vector θ i we assume that it is a realization of a corresponding random vector ϑ i which is distributed according to the prior pdf π i (θ i ). A test for composite hypotheses would form the two mixture pdfs f i (X) = f i (X θ i )π i (θ i )dθ i and then apply the likelihood ratio on the resulting densities. Again, as before, this approach is unable to propose an estimate for the parameter vector θ i that generates the observed data X. We realize that the isolation problem has now turned into a parameter estimation problem consequently, if our goal is to make, simultaneously, detection and parameter estimation, a possibility could be to apply the GLRT or equivalently H sup θ1 f 1 (X θ 1 ) 1 λ, (1.7) sup θ0 f 0 (X θ 0 ) H f 1 (X ˆθ 1 ) 1 λ f 0 (X ˆθ 0 ) (1.8) ˆθ i = arg sup θ i f i (X θ i ), i = 0, 1. (1.9) With this test we decide between the two hypotheses providing, at the same time through (1.9), maximum likelihood estimates of the desired parameters. The first asymptotic optimality result for GLRT can be traced back to 1943 in the work of Wald [1] while subsequent results can be found in [2, 3, 4, 5]. A thorough analysis of this subject exists in [6, Chapter 22] and additional references in [7]. We should also mention a series of results [8]-[13] addressing the asymptotic optimality property of GLRT but for special classes of processes. Finally in [14] GLRT is related to the uniformly most powerful invariant (MPI) test and conclusions about its asymptotic optimality are drawn from this connection. As far as applications are concerned, the literature dealing with GLRT is enormous, indicating the significant practical usefulness of this very simple decision mechanism. Despite GLRT s extreme popularity, no finite sample size optimality result has been developed so far to support it. It is exactly this gap we intend to fill with our current work. Of course, it is unrealistic to expect that GLRT will turn out to be finite-sample-size-optimum with respect to some known criterion. The only chance we have to prove such type of optimality is by introducing a new performance measure. The measure we intend to adopt, we believe, makes a lot of sense and it is tailored to the fact that GLRT performs simultaneous detection/isolation or detection/estimation. Furthermore, with our analysis we will not only provide the missing optimality theory for GLRT but we will also offer novel GLRT-like alternatives which might turn out to be more suitable for certain applications than the existing test. 1.2 Randomized Decision Rules and Classical Hypothesis Testing Before introducing our main results let us first revisit two classical problems from hypothesis testing theory, namely binary hypothesis testing in the Neyman-Pearson sense and multiple hypothesis testing in the Bayesian sense. We would like to develop the corresponding familiar optimum detection strategies by working with the

7 1.2: Randomized Decision Rules and Classical Hypothesis Testing 3 class of randomized decision rules. Randomized tests tend to be easier to optimize than their deterministic counterparts because they involve optimization of functions as opposed to optimization of (decision) sets needed in deterministic tests. The reason we insist on the two classical hypothesis testing problems is because we intend to propose a new combined version that will produce GLRT in a natural way. Furthermore, as we mentioned, we pay special attention to the class of randomized tests instead of the conventional deterministic class because with the former it is straightforward to develop the desired optimum decision strategy Neyman-Pearson Binary Hypothesis Testing Consider a random data vector X that takes values in R N and two hypotheses : X f 0 (X); H 1 : X f 1 (X), where f i (X) denotes the pdf of the data vector X under hypothesis H i. For every realization X we must come up with a decision D {0, 1}. Given X, with a randomized decision rule our decision d is a random variable. Therefore let δ 0 (X), δ 1 (X) denote the probability of our decision D being 0 and 1 respectively. It is clear that the two probabilities must be complementary, i.e. δ 0 (X) + δ 1 (X) = 1 and functions of the observation vector X. A randomized decision rule is completely specified once these two functions are known. A decision D is reached with the help of a random selection game where we select D = 0 with probability δ 0 (X) and D = 1 with probability δ 1 (X) using, for example, an unfair coin tossing procedure. The class of randomized decision rules is richer than the class of deterministic strategies. Indeed, we recall that a deterministic strategy is defined with the help of two complementary sets A 0, A 1 R N, where A 1 = A c 0 and superscript c denotes complement, and we decide in favor of H j whenever X A j, j = 0, 1. Deterministic strategies make always the same decision for the same data vector X unlike their randomized counterparts where the decision depends on the outcome of the random game. A deterministic strategy can be viewed as a randomized rule by selecting δ j (X) = 1 {Aj}(X) where 1 {A} (X) is the indicator function of the set A. Note that whenever X A j the deterministic rule selects H j ; its randomized version on the other hand selects H j with probability δ j (X) = 1 {Aj}(X) = 1 which, of course, is the equivalent of a deterministic decision. The advantage of using randomized rules is that we work with functions instead of sets (which is the practice with deterministic strategies). This facilitates considerably the understanding of proofs by a novice reader who is more familiar with function than set optimization. Let us now attempt to solve the binary hypothesis testing problem in the sense of Neyman-Pearson. We are seeking a randomized rule [δ 0 (X), δ 1 (X)] that maximizes the probability of detection P(D = 1 H 1 ) subject to the constraint that the false alarm probability P(D = 1 ] does not exceed a prescribed level α [0, 1]. We can immediately see that P(D = j H i ) = δ j (X)f i (X)dX. (1.10) sing the Lagrange multiplier technique, we can transform the constrained optimization problem into an unconstrained one as follows { } δ 1 (X)[f 1 (X) λf 0 (X)]dX, (1.11) max δ 1(X) where λ 0 the Lagrange multiplier. Since 0 δ 1 (X) 1 (we recall that δ 1 (X) is a probability) we conclude that the optimum δ1(x) o is 1 when f 1 (X) λf 0 (X) > 0 δ1(x) o = γ(x) when f 1 (X) λf 0 (X) = 0 (1.12) 0 when f 1 (X) λf 0 (X) < 0, where γ(x) is any arbitrary probability. This rule is of course equivalent to the classical likelihood ratio test of selecting with probability 1 (therefore deterministically) H 1 when f 1 (X)/f 0 (X) > λ; favoring when f 1 (X)/f 0 (X) < λ and deciding randomly with probability γ(x) in favor of H 1 (and therefore with probability 1 γ(x) in favor of ) whenever the likelihood ratio coincides with the threshold λ. Threshold λ and randomization probability γ(x) are selected so that the likelihood ratio test meets the false alarm constraint with equality. The proof of existence of suitable values for λ and γ(x) (the latter is usually set to a constant) for any level α [0, 1] and of the optimality of the resulting test can be found in any basic textbook on hypothesis testing (see for example [15, Page 22]). We observe that under the richer class of randomized rules we still obtain the classical likelihood ratio test as our optimum detection scheme. It should be noted that although randomization does not improve the optimum

8 4 1: Joint Hypothesis Testing and Isolation (deterministic) rule, this is not necessarily the case when randomization is applied to suboptimum tests (see for example [16] where the introduction of noise transforms a deterministic test into a randomized one and improves performance) Bayesian Multiple Hypothesis Testing Consider now the case where the random data vector X satisfies K hypotheses of the form H k : X f k (X) with corresponding prior probability π k where k = 1,..., K. Here decision d takes values in the set {1,..., K} while the randomized decision mechanism is comprised of K complementary probabilities δ 1 (X),..., δ K (X), with δ l (X) 0; δ 1 (X)+ +δ K (X) = 1 and δ l (X) denoting the probability of selecting D = l, using a random selection game. For a Bayesian formulation we also need to specify a collection of costs C k l, k, l = 1,..., K, where Ck l expresses the cost of deciding in favor of H l (i.e. D = l) when the true hypothesis is H k. The goal is to select the randomized decision strategy, namely the probabilities δ l (X), in order to minimize the average cost. If we denote the latter by C and recall (1.10), we can write K K K K C = Cl k P(D = l & H k ) = Cl k P(D = l H k )π k (1.13) = = l=1 k=1 K l=1 K l=1 where the functions D l (X) are defined as l=1 k=1 { K } δ l (X) Cl k f k (X)π k dx = k=1 δ l (X) min l D l (X)dX = K l=1 δ l (X)D l (X)dX (1.14) { K } min D l (X) δ l (X) dx (1.15) l min l D l (X)dX, (1.16) D l (X) = l=1 K Cl k f k (X)π k. (1.17) k=1 In the previous derivations, inequality (1.15) is true because δ l (X) 0, while (1.16) is a consequence of the same functions being complementary. The final integral in (1.16) is independent from the decision strategy, therefore it constitutes a lower bound to the performance of any randomized rule. Furthermore this lower bound is always attainable by the following decision rule which is thereby optimum { δk(x) o 1 when k = arg minl D = l (X) 0 otherwise. (1.18) The previous relation is the randomized version of the well known (deterministic) Bayesian optimum decision strategy D = arg min D l(x). (1.19) 1 l K Clearly if more than one indexes attain the same minimum then we randomize among them with arbitrary complementary probabilities. We also recall the very interesting special case Cl k = 1 when l k and Cl l = 0, for which the average costs C becomes the probability of making an erroneous decision. For this case the decision rule (1.19) is equivalent to D = arg max π π l f l (X) lf l (X) = arg max 1 l K 1 l K K k=1 π kf k (X). (1.20) In other words we select the hypothesis with the maximum aposteriori probability (MAP). Again, we observe that we obtain the classical optimum detection scheme of the deterministic setup. In the next section we are going to combine the previous two results and propose a new performance measure which will be optimized by GLRT.

9 1.3: Combined Hypothesis Testing and Isolation Combined Hypothesis Testing and Isolation Let us return to the binary case and assume that each hypothesis is composite. In other words under each hypothesis we have more than one possible data pdfs with a known prior probability. For notational simplicity we are going to regard each such possibility as a different subhypothesis. Therefore we are going to say that is comprised of the subhypotheses k, k = 1,..., K 0, where under k : X f 0k (X) with a prior probability π 0k. Similarly H 1 has the sybhypotheses H 1k, k = 1,..., K 1, where under H 1k : X f 1k (X) with prior probability π 1k. Probabilities π ik, k = 1,..., K i, i = 0, 1, are the prior probabilities of the subhypotheses given that the main hypothesis H i is true. Consequently π π 0K0 = π π 1K1 = 1. If we simply like to decide between and H 1 then, as was mentioned in the Introduction, we apply the test depicted in (1.3). If however our goal is, in addition to this decision, to isolate the specific subhypothesis which is responsible for the observed data vector X, then we need to formulate the problem differently. Note that a randomized rule capable of selecting between subhypotheses requires the definition of K 0 + K 1 complementary probabilities δ 01 (X),..., δ 0K0 (X), δ 11 (X),..., δ 1K1 (X) (1.21) where for the randomization probabilities δ jl (X), j = 0, 1 and l = 1,..., K j, we have δ jl (X) 0 and where [δ 01 (X) + + δ 0K0 (X)] + [δ 11 (X) + + δ 1K1 (X)] = 1. (1.22) A key point in developing our methodology consists in observing that it is possible to write δ jl (X) = δ j (X)q jl (X), (1.23) δ j (X) = δ j1 (X) + + δ jkj (X) and q jl (X) = δ jl(x) δ j (X), j = 0, 1; l = 1,..., K j. (1.24) This alternative form of the randomization probabilities involves the following set of functions δ 0 (X), δ 1 (X), q 01 (X),..., q 0K0 (X), q 11 (X),..., q 1K1 (X) (1.25) for which, because of (1.22), (1.23), (1.24), we have δ 0 (X) + δ 1 (X) = q 01 (X) + + q 0K0 (X) = q 11 (X) + + q 1K1 (X) = 1. (1.26) Actually δ j (X), j = 0, 1 expresses to the total randomization probability of selecting hypothesis H j whereas q jl (X) becomes the conditional probability of selecting subhypothesis H jl given that we have selected the main hypothesis H j. The two different sets of randomization probabilities depicted in (1.21) and (1.25) suggest two different randomized games for the combined detection/isolation problem. With the help of the probabilities δ jl (X) in (1.21), the decision mechanism involves a single step which directly selects a specific subhypothesis. In other words we simultaneously detect and isolate. This approach is similar to the multiple hypothesis testing problem considered previously. Now, if we use the alternative set in (1.25) then the detection/isolation process is concluded in two steps since it involves two different decisions, namely d 1 for detection and d 2 for isolation. Specifically: Step 1: We first make a decision d 1 {0, 1} using the randomization probabilities δ 0 (X), δ 1 (X) and decide between the two main hypotheses, H 1. Step 2: Given that in the first step we decided D 1 = j, that is, in favor of the main hypothesis H j, we continue with the isolation part and we select d 2 {1,..., K j } using the randomization probabilities q jl (X), thus isolating one of the subhypotheses H jl. The second randomized decision must be (conditionally on X) independent from the one applied in the first step. The fact that in Step 2 the randomized selection game is independent from Step 1, allows for the writing of the probabilities δ jl (X) in the product form appearing in (1.23). We would like to emphasize that the two randomized decision procedures, that is, the first based on (1.21) and the second using (1.25) are perfectly equivalent. Indeed from (1.21) we obtain (1.25) by applying (1.24) while we obtain (1.21) from (1.25) by using (1.23). The basic difference between the two decision strategies is that the

10 6 1: Joint Hypothesis Testing and Isolation second method respects the grouping of the subhypotheses while the first disregards this property completely. It is in fact this grouping of the second decision mechanism that will give rise to the desired test. We should also mention that it is not equally straightforward to come up with the alternative decision mechanism by working solely with deterministic instead of randomized tests. Consequently, this fact justifies the use of this larger class of rules Optimality of GLRT Let us demonstrate the usefulness of the alternative decision mechanism presented above by introducing a simple detection/isolation problem which leads directly to the optimality of the classical GLRT. For our two-step decision process, consider the two probabilities P(Correct-detection/isolation H 1 ) and P(Miss-detection/isolation ). Following a Neyman-Pearson approach we are interested in maximizing P(Correct-detection/isolation H 1 ) subject to the constraint that P(Miss-detection/isolation ) is no larger than a prescribed level. The following theorem addresses explicitly this problem and introduces the corresponding optimum solution. Theorem 1.1: Consider the class J α of all detection/isolation tests that satisfy the constraint where α min α 1, with P(Miss-detection/isolation ) α, (1.27) α min = 1 max {π 0k f 0k (X)}dX. (1.28) 1 k K 0 Then the test, within the class J α, that maximizes the probability P(Correct-detection/isolation H 1 ) is given by: Step 1: The optimum strategy for deciding between the two main hypotheses and H 1 is the GLRT max {π 1k f 1k (X)} H 1 1 k K 1 λ (1.29) max {π 0k f 0k (X)} 1 k K 0 where, whenever the left hand side coincides with the threshold we perform a randomization between the two hypotheses and select H 1 with probability γ. Step 2: If in Step 1 we decide in favor of hypothesis H i (i.e. D 1 = i) then the optimum isolation strategy is D 2 = arg max {π ik f ik (X)}. (1.30) 1 k K i If more than one indexes attain the same maximum we perform an arbitrary randomization among them. Threshold λ and randomization probability γ of Step 1 must be selected so that the constraint in (1.27) is satisfied with equality. Proof: Note that P(Miss-detection/isolation ) = 1 P(Correct-detection/isolation ), therefore the constraint is equivalent to P(Correct-detection/isolation ) 1 α. Furthermore with K i P(Correct-detection/isolation H i ) = P(Correct-detection/isolation H ik )π ik (1.31) k=1 P(Correct-detection/isolation H ik ) = δ i (X)q ik (X)f ik (X)dX. (1.32) To solve the constrained optimization problem, let λ 0 be a Lagrange multiplier and, as in the classical Neyman- Pearson case, define the corresponding unconstrained version. With the help of (1.31) and (1.32) we can write P(Correct-detection/isolation H 1 ) + λ P(Correct-detection/isolation ) { K1 } { K0 } = δ 1 (X) q 1k (X)π 1k f 1k (X) dx + λ δ 0 (X) q 0k (X)π 0k f 0k (X) dx (1.33) k=1 δ 1 (X) max {π 1k f 1k (X)} dx + λ δ 0 (X) max {π 0k f 0k (X)} dx (1.34) 1 k K 1 1 k K 0 [ ] = δ 1 (X) max {π 1k f 1k (X)} + δ 0 (X)λ max {π 0k f 0k (X)} dx (1.35) 1 k K 1 1 k K 0 { } max max {π 1k f 1k (X)}, λ max {π 0k f 0k (X)} dx. (1.36) 1 k K 1 1 k K 0 k=1

11 1.3: Combined Hypothesis Testing and Isolation 7 Inequality (1.34) is valid because the functions q ik (X), k = 1,..., K i are nonnegative and complementary and (1.36) is true because the same properties hold for δ i (X), i = 0, 1. Note that the final expression constitutes an upper bound on the performance of any detection/isolation rule. Furthermore this upper bound is attainable by a specific detection/isolation strategy. Indeed we note that we have equality in (1.34) when the isolation probabilities are selected as { qik(x) o 1 if k = arg min1 l Ki {π = il f il (X)} (1.37) 0 otherwise, and we randomize if there are more than one indexes attaining the same maximum. This optimum isolation process is the randomized equivalent of (1.30). Similarly we have equality in (1.36) when we select the detection probabilities to be 1 if max 1 j K1 {π 1j f 1j (X)} λ max 1 j K0 {π 0j f 0j (X)} δ1(x) o = γ if max 1 j K1 {π 1j f 1j (X)} = λ max 1 j K0 {π 0j f 0j (X)} (1.38) 0 otherwise, and δ0(x) o = 1 δ1(x). o Clearly this optimum detection procedure is the equivalent of (1.29). As far as the false alarm constraint is concerned let us define the following sets { A (λ) = X : max } 1 j K 1 {π 1j f 1j (X)} max 1 j K0 {π 0j f 0j (X)} > λ { B(λ) = X : max } 1 j K 1 {π 1j f 1j (X)} max 1 j K0 {π 0j f 0j (X)} = λ. For the test introduced above, we can then write that P(Miss-detection/isolation ) = 1 max {π 0j f 0j (X)} dx γ A (λ) 1 j K 0 1 max {π 0j f 0j (X)} dx A (λ) B(λ) 1 j K 0 1 max {π 0j f 0j (X)} dx = α min. 1 j K 0 B(λ) max {π 0j f 0j (X)} dx 1 j K 0 (1.39) (1.40) The lower bound α min is clearly attainable in the limit by selecting γ = 1 and letting λ 0. Also the missdetection/isolation probability is bounded from above by 1 and we can see that this value can also be attained in the limit by selecting γ = 0 and letting λ. Existence of a suitable threshold λ and a randomization probability γ that assure validity of the false alarm constraint with equality, as well as, optimality of the resulting test in the desired sense, can be easily demonstrated following exactly the same steps as in the classical Neyman- Pearson case 1. This concludes the proof. We realize that in order to apply the test in (1.29) we need knowledge of the prior probabilities π ik. Whenever this information is not available we can consider equiprobable subhypotheses under each main hypothesis and select π ik = 1/K i. nder this assumption the optimum test in (1.29) is reduced into the classical form of GLRT depicted in (1.5) (after absorbing the two prior probabilities inside the threshold). Finally, we should mention that if hypothesis is simple or, if under hypothesis we are not interested in the isolation problem (therefore we can treat it as simple by forming the mixture density) then P(Miss-detection/isolation ) becomes the usual false alarm probability with corresponding α min = 0. In other words the false alarm probability can take any value in the interval [0, 1] as in the classical Neyman-Pearson case. Remark 1.1: We observe that the optimum test, under each main hypothesis, selects the most appropriate subhypothesis with the help of a MAP isolation rule, exactly as in (1.20). The interesting point is that this selection is performed independently from the other hypothesis and from the corresponding detection strategy. This is clearly a very desirable property since it separates the isolation from the detection problem. In our developments we are going to obtain suitable conditions that can guarantee the same characteristic for the extended detection/isolation problem introduced next. 1 In the proof we simply replace the pdfs f i (X) with the functions max 1 j Ki {π ik f ik (X)}. Even though these functions are not densities, the proof goes through without change.

12 8 1: Joint Hypothesis Testing and Isolation Combined Neyman-Pearson and Bayesian Hypothesis Testing The previous results are directly extendable to a more general formulation where we impose costs on combinations of decisions and (sub)hypotheses. We should however emphasize that we are interested in preserving the grouping of the two sets of subhypotheses defined in the previous subsection, since this is the key idea that produces the GLRT. Therefore suppose that Cjl ik denotes the cost of deciding in favor of subhypothesis H jl (i.e. D 1 = j, D 2 = l) when the true subhypothesis is H ik. For the indexes we have i, j {0, 1} while k {1,..., K i } and l {1,..., K j }. Let us now consider the average cost C i given that the main hypothesis H i, i = 0, 1, is true. We have C i = = K i K 1 j Cjl ik P(D 1 = j & D 2 = l & H ik H i ) k=1 j=0 l=1 ( K i K0 C ik k=1 l=1 K 1 0l P(D 1 = 0 & D 2 = l H ik ) + l=1 C ik 1l P(D 1 = 1 & D 2 = l H ik ) { } K 0 K 1 = δ 0 (X) q 0l (X)D0l(X) i + δ 1 (X) q 1l (X)D1l(X) i dx, k=1 k=1 ) π ik (1.41) where we define D i jl (X) = K i k=1 Cik jl f ik(x)π ik. By following a Neyman-Pearson like approach we propose to minimize C 1 under the constraint that C 0 does not exceed some prescribed value. As we realize, within each main hypothesis we employ a Bayesian formulation, whereas across main hypotheses, the formulation is of Neyman-Pearson type. With this specific setup we maintain the required grouping of subhypotheses mentioned before, a fact that will produce alternative to GLRT schemes. In the next theorem we define explicitly the optimization problem of interest and offer the corresponding general optimum solution. Theorem 1.2: Consider the class J α of detection/isolation tests that satisfy C 0 α, then the test that minimizes the cost C 1 within the class J α is given by D 1 0ˆl 0 (X) D 1 1ˆl 1 (X) with the corresponding isolation process satisfying H 1 [ ] λ D 0 (X) D 0 (X), (1.42) 1ˆl 1 0ˆl 0 ˆlj = arg min [Djl(X) 1 + λdjl(x)], 0 j = 0, 1. (1.43) 1 l K j Threshold λ 0 and the randomization probability γ are selected so that the resulting test satisfies the constraint with equality. Proof: Consider the unconstrained problem of minimizing C 1 + λc 0 where λ 0 a Lagrange multiplier. We can then write C 1 + λc 0 { K 0 = δ 0 (X) q 0l (X) [ D0l(X) 1 + λd0l(x) 0 ] K 1 + δ 1 (X) q 1l (X) [ D1l(X) 1 + λd1l(x) 0 ]} dx (1.44) k=1 { { δ 0 (X) min D 1 0l (X) + λd0l(x) 0 } { + δ 1 (X) min D 1 1l (X) + λd1l(x) 0 } } dx (1.45) 1 l K 0 1 l K 1 { [ ] [ ]} = δ 0 (X) D 1 (X) + λd 0 (X) + δ 0ˆl 0 0ˆl 1 (X) D 1 (X) + λd 0 (X) dx (1.46) 0 1ˆl 1 1ˆl 1 { } min D 1 (X) + λd 0 (X), D 1 (X) + λd 0 (X) dx. (1.47) 0ˆl 0 0ˆl 0 1ˆl 1 1ˆl 1 We have equality in (1.45) whenever the isolation procedure satisfies (1.43) and equality in (1.47) whenever detection is according to (1.42). If threshold λ and randomization probability γ are such that the false alarm constraint is satisfied with equality, it is then straightforward to show that the corresponding combined scheme is indeed optimum in the sense that it minimizes C 1 within the class J α. This concludes the proof. k=1

13 1.3: Combined Hypothesis Testing and Isolation 9 Remark 1.2: Regarding the allowable values of level α we have that α min α α max. nder this general setting it is possible to find an expression only for the lower end α min. It is easily seen that Z j ff C 0 min min D0l(X), 0 min D1l(X) 0 dx = α min. (1.48) 1 l K 0 1 l K 1 Furthermore, this value α min is attainable by the optimum test as one can verify by letting λ. nfortunately we cannot obtain a similar expression for the upper limit α max of C 0, since it is not clear whether the cost C 0 (λ) of the optimum scheme is a monotone function of λ. Of course we can always say that α max = sup 0 λ C 0 (λ), but the practical usefulness of this conclusion is minimal. Remark 1.3: From (1.43) we understand that the isolation process under each hypothesis (expressed through the corresponding minimization) takes into account the statistics of the other hypothesis and also depends on the detection rule through the threshold λ. We recall that in GLRT this is not the case, since we simply use a MAP selection that neither depends on the other hypothesis nor on the threshold λ. In order to obtain the same desirable property under this more general setup it is sufficient to assume that 2 This, in turn, yields making (1.42) equivalent to C 1k 0l while the isolation process (1.43) simplifies to = C0 1k ; C1l 0k = C1 0k. (1.49) D 1 0l(X) = D 1 0 (X); D 0 1l(X) = D 0 1 (X) (1.50) H 1 [ D0 1 (X) min D1l(X) 1 λ 1 l K 1 D1 0 (X) ˆli = arg min 1 l K i D j il ] min D0l(X) 0 1 l K 0, (1.51) (X). (1.52) With the conditions in (1.49) the isolation process simplifies considerably since under hypothesis H i it involves only the Bayes cost Cil ik, in other words the cost that we would use if we had only the isolation problem exactly as in Subsection Consequently isolation under each hypothesis becomes independent from the isolation of the other hypothesis and also independent from the detection process, thus matching the property observed in GLRT. Although it is possible to offer several intriguing examples for our general problem, it seems more interesting to postpone this presentation until Section 2.3 after we consider the problem of combined detection/estimation. 2 In GLRT this property holds since C 1k 0l = C 0k 1l = 1.

14 2 Joint Hypothesis Testing and Estimation 2.1 Introduction A vastly more interesting problem arises when we combine hypothesis testing with parameter estimation. Therefore, suppose that under H i, i = 0, 1 the corresponding data pdf have the form f i (X θ i ) where θ i are parameters with prior pdf π i (θ i ). As mentioned in the Introduction, if we simply desire to discriminate between and H 1 then we can form the mixture pdfs f i (X) = f i (X θ i )π i (θ i )dθ i and apply the likelihood ratio test. When however our goal is to perform simultaneous detection and parameter estimation, then we need to develop techniques that are similar to the ones presented in the previous section and in particular Subsection Before proceeding with this extension let us first discuss the notion of a randomized estimator by revisiting the problem of optimum Bayesian estimation Optimum Bayesian Estimation As in hypothesis testing, let X R N be a random data vector which is distributed according to a pdf f(x θ). For θ we assume that it is a realization of a random parameter vector ϑ for which we have available a known prior pdf π(θ). Given a realization X of the data vector, we would like to come up with a parameter estimate ˆθ. Following a Bayesian approach if θ is the true parameter vector and ˆθ the corresponding estimate this generates a cost C(ˆθ, θ). Our goal is to propose an estimation strategy which minimizes the average cost. This problem is very similar to the Bayesian multiple hypothesis testing problem treated in Subsection We recall that in hypothesis testing there was a finite number of hypotheses and an equal number of possible decisions (selections). Here, loosely speaking, each possible value of θ corresponds to a possible hypothesis, consequently our decision ˆθ and the true parameter vector θ can take a continuum of values. We also recall that in the case of finite possibilities a randomized decision rule was defined with the help of a corresponding finite set of complementary probabilities δ l (X). If we like to extend this idea, we need to assign to each possible selection ˆθ a probability which is a function of X. Since ˆθ takes a continuum of values, to each ˆθ we can assign, in principle, a differential probability δ(ˆθ X)dˆθ. This suggests that the equivalent of the probabilities δ l (X) is now a probability density function δ(ˆθ X), that is, a function that satisfies δ(ˆθ X) 0 and δ(ˆθ X)dˆθ = 1. Clearly the notion of randomized estimator is not new. Bayesian approaches make use of such entities as one can verify by consulting [17, Page 65]. The posterior parameter pdf given the data X constitutes the most common randomized estimator used in practice. Here however we need the general definition where any parameter pdf can play the role of an estimator. As it becomes clear from the previous discussion, a randomized estimator is completely specified if we define the pdf δ(ˆθ X). At this point it becomes interesting to mention how we can produce an actual estimate ˆθ. We recall that in the previous section our decision was the outcome of a random selection game. Following the same idea here, we need to generate a realization of a random variable distributed according to δ(ˆθ X). This realization becomes our estimate! Although randomized estimates might seem even more awkward than randomized decisions, they nevertheless constitute their natural extension. Despite the seemingly counter-intuitive form of the proposed estimation mechanism, we must point out that randomized estimators unify the two problems of hypothesis testing and estimation in a straightforward manner. 10

15 2.1: Introduction 11 Indeed, as we will be able to verify shortly, we obtain the corresponding optimum schemes by applying exactly the same methodology. Finally we should also add that the class of randomized estimators is richer than the class of their deterministic counterparts. This is because any deterministic estimator of the form ˆθ = G(X), where G(X) is a deterministic function of X, can be modeled as a randomized estimator having the pdf δ(ˆθ X) = Dirac(ˆθ G(X)). In other words the pdf assigns all its probability mass to the selection ˆθ = G(X). Let us now look for the optimum estimator within the class of randomized estimators that minimizes the expected cost. If we call the latter C we can write C = C(ˆθ, θ)δ(ˆθ X)f(X θ)π(θ)dθdˆθdx [ { } ] [ ] = δ(ˆθ X) C(ˆθ, θ)f(x θ)π(θ)dθ dˆθ dx = δ(ˆθ X)D(ˆθ, X)dˆθ dx [ { } ] [ ] (2.1) δ(ˆθ X) inf D(, X) dˆθ dx = inf D(, X) δ(ˆθ X)dˆθ dx = inf D(, X)dX, where we defined D(, X) = C(, θ)f(x θ)π(θ)dθ. The last integral in (2.1) constitutes a lower bound on the performance of any randomized estimator. This lower bound is attainable if we select ) δ(ˆθ X) = Dirac (ˆθ arg inf D(, X), (2.2) provided that arg inf D(, X) is a usual function 1 of X. It is clear that if the infimum is attained by a single function of X, the resulting optimum estimator is purely deterministic. When however we have more than one choices then we can randomize among them with arbitrary randomization probabilities and the resulting estimator will be randomized. By comparing the previous derivations with Eqs. (1.13)-(1.16) of Subsection we realize that the corresponding steps are completely analogous Combined Neyman-Pearson Hypothesis Testing and Bayesian Estimation In this part we are going to extend the result obtained in Subsection Suppose again that the data vector X under hypothesis H i, i = 0, 1 satisfies X f i (X θ i ) where θ i is a realization of a random parameter vector ϑ i with prior pdf π i (θ i ). When a realization X of X is available we would like to decide between and H 1 and also estimate the corresponding parameter vector. A randomized detection/estimation structure will be comprised of the following set of functions δ 0 (X), δ 1 (X), q 0 (ˆθ 0 X), q 1 (ˆθ 1 X), (2.3) that are the equivalent of (1.25). These functions are nonnegative satisfying δ 0 (X) + δ 1 (X) = q 0 (ˆθ 0 X)dˆθ 0 = q 1 (ˆθ 1 X)dˆθ 1 = 1, (2.4) that corresponds to (1.26). The two probabilities δ j (X) are complementary while the two functions q j (ˆθ j X) are pdfs with respect to ˆθ j. Our randomized detection/estimation strategy involves again two steps. In Step 1 with probabilities δ j (X), j = 0, 1 we decide between the two main hypotheses H j while in Step 2, given that in the previous step the decision was D 1 = j, using the randomized estimator q j (ˆθ j X) we provide a parameter estimate ˆθ j. Let us now develop the equivalent of our results in Subsection This will become our starting point for considering various special cases that will give rise to interesting novel GLR-type tests. Denote with C i j (ˆθ j, θ i ) the cost of providing the parameter estimate ˆθ j after having decided that the main hypothesis is H j, when the true 1 For simplicity we assume that the infimum is in fact a minimum, in other words that there exists (at least one) function ˆθ = G(X) that attains the minimal value. In the opposite case we need to become more technical and introduce the notion of ɛ-optimality with estimation strategies that have performance which is ɛ-close to the optimum.

16 12 2: Joint Hypothesis Testing and Estimation main hypothesis is H i and the corresponding true parameter value is θ i. If C i denotes the average cost given that hypothesis H i is true, then we have the following expression for this quantity, which is the equivalent of (1.41) C i = { δ 0 (X) q 0 (ˆθ 0 X)D0(ˆθ i 0, X)dˆθ 0 + δ 1 (X) q 1 (ˆθ 1 X)D i 1(ˆθ 1, X)dˆθ 1 } dx. (2.5) We define D i j (, X) = C i j (, θ i)f i (X θ i )π i (θ i )dθ i. Consider now the problem of optimizing C 1 among all detection/estimation schemes that satisfy the constraint that C 0 is no larger than a prescribed value. The next theorem defines this problem explicitly and provides the corresponding optimum solution. Theorem 2.1: Consider the class J α of detection/estimation tests that satisfy C 0 α, then the test that minimizes the cost C 1 within the class J α is given by D 1 0 (ˆθ 0, X) D 1 1 (ˆθ 1, X) with the corresponding estimations defined by H 1 [ ] λ D1 0 (ˆθ 1, X) D0 0 (ˆθ 0, X), (2.6) ˆθ j = arg inf [D 1 j (, X) + λd 0 j (, X)], j = 0, 1. (2.7) Proof: The proof is exactly similar to the proof of Theorem 2 with the sums replaced by integrals. Remark 2.1: For the level α we have α min α α max and, as in the discrete case, we have an expression only for the lower bound Z n o α min = min min D 0 0 (, X), min D 1 0 (, X) dx. (2.8) Remark 2.2: Following the same lines of Theorem 2, by assuming C0 1 (, θ) = C0 1 (θ) and C1 0 (, θ) = C1 0 (θ) we obtain D0 1 (, X) = D0 1 (X) and D1 0 (, X) = D1 0 (X). nder this assumption the optimum test in (2.6) simplifies to H 1 h i D0 1 (X) inf D 1 1 (, X) λ D1 0 (X) inf D 0 0 (, X), (2.9) and the optimum parameter estimate becomes ˆθ j = inf D j j (, X). (2.10) The important consequence of this simplification is that the estimation part, under each hypothesis, reduces to the optimum Bayes estimator which is independent from the other hypothesis and the detection rule. 2.2 Variations In this section we are going to present two variations of the same idea that might turn out to be interesting for applications. In both cases the resulting estimator under each hypothesis is the optimum Bayes, exactly as in GLRT. We start with the case where the parameters are known under, a scenario that is quite frequent in practice Known Parameters under Let f(x θ) be a pdf with θ a parameter vector. Suppose that under we have θ = 0 whereas under H 1 vector θ follows a prior pdf π(θ). We would like to test against H 1, but whenever we decide in favor of H 1 we would also like to provide an estimate ˆθ for the corresponding parameter vector θ. Since parameter estimation is needed only under H 1, this suggests that a combined detection/estimation scheme will be comprised of the functions δ 0 (X), δ 1 (X), q 1 (ˆθ X) that satisfy δ j (X) 0, j = 0, 1, q 1 (ˆθ X) 0, δ 0 (X) + δ 1 (X) = q 1 (ˆθ X)dˆθ = 1. The two probabilities δ 0 (X), δ 1 (X) will be used in the first step to decide

17 2.2: Variations 13 between the two main hypotheses, while q 1 (ˆθ X) will be employed in the second step to provide the required estimate for θ, every time we decide in favor of H 1. Regarding the Bayesian cost we define C(ˆθ, θ) to be the cost of providing an estimate ˆθ when the true value is θ. Of course this cost makes sense only under H 1. Consequently if the true hypothesis is H 1 with parameter θ and we decide in favor of H 1 with parameter estimate ˆθ then, as we said, the cost is C(ˆθ, θ). If however we are under H 1 with parameter value is θ and we decide in favor of, then this is like selecting ˆθ = 0. Hence, it makes sense to assign to this event the cost C(0, θ). sing these observations it is straightforward to compute the average cost under H 1 which takes the form C 1 = δ 1 (X)D(ˆθ, X)q 1 (ˆθ X)dˆθdX + δ 0 (X)D(0, X)dX (2.11) with D(, X) = C(, θ)f(x θ)π(θ)dθ. For this special problem we propose to minimize the average cost C 1 under H 1 and at the same time control the false alarm probability under. The next theorem presents explicitly the problem of interest and introduces the corresponding optimum solution. Theorem 4: Consider the class J α of detection/estimation procedures with false alarm probability not exceeding the level α. Then within the class J α the test that minimizes the average cost C 1 is given by D(0, X) inf D(, X) f(x 0) H 1 λ. (2.12) Threshold λ 0 and randomization probability γ are selected so that the false alarm constraint is satisfied with equality. Proof: The false alarm under is given by P(D 1 = 1 ) = δ 1 (X)f(X 0)dX. If λ 0 a Lagrange multiplier then we are interested in minimizing the combination C 1 + λp(d 1 = 1 ). sing (2.11) we have C 1 + λp(d 1 = 1 ) = δ 1 (X)D(ˆθ, X)q 1 (ˆθ X)dˆθdX + { [ δ 1 (X) inf D(, X)dX + λf(x 0) ] δ 0 (X)D(0, X)dX + λ δ 1 (X)f(X 0)dX (2.13) } + δ 0 (X)D(0, X) dx (2.14) { } min inf D(, X)dX + λf(x 0), D(0, X) dx. (2.15) We have equality in (2.14) whenever the estimator q 1 (ˆθ X) is the optimum Bayesian estimator and equality in (2.15) whenever our decision between the two main hypotheses is according to (2.12) Conditional Cost A slightly different and in some sense more general approach is the assume that under we have X f 0 (X) and under H 1 the data satisfy X f 1 (X θ) with the parameter vector having the prior π(θ). Here as before we assume that under the data pdf is completely known but it does not necessarily correspond to a specific selection of the parameter vector θ of f 1 (X θ). Regarding Bayesian costs we only define the cost function C(ˆθ, θ) expressing the cost of providing the estimate ˆθ when the true value is θ. Clearly this cost makes sense whenever the true hypothesis is H 1 and with our detection scheme we also decide in favor of H 1. Again our decision mechanism involves δ 0 (X), δ 1 (X), q 1 (ˆθ X) since there is no estimation under. Here however we are interested in computing the average cost under H 1 but conditioned on the event that we have selected correctly the main hypothesis, namely C = E[C(ˆθ, θ) H 1, D 1 = 1] = = C(ˆθ, θ)δ1 (X)q 1 (ˆθ X)f 1 (X θ)π(θ)dˆθdθdx δ1 (X)f 1 (X θ)π(θ)dθdx D(ˆθ, X)δ1 (X)q 1 (ˆθ X)dˆθdX δ1 (X)f 1 (X)dX (2.16)

18 14 2: Joint Hypothesis Testing and Estimation where D(, X) = C(, θ)f 1 (X θ)π(θ)dθ and f 1 (X) = f 1 (X θ)π(θ i )dθ is the mixture pdf. In other words we consider the average (estimation) cost conditioned on the event that we have correctly detected the main hypothesis. We can now attempt to minimize C and at the same time control the false alarm probability. This setup makes a lot of sense since the false alarm constraint assures the acceptable performance of the detection part while the conditional cost minimization provides the best possible estimator whenever we have correctly decided in favor of H 1. The next theorem solves exactly this problem. Theorem 5: Consider the class J α of detection/estimation procedures for which P(D 1 = 1 ) α with α [0, 1], then the optimum test that minimizes the cost C within the class J α is given by while the corresponding estimator is ρf 1 (X) inf D(, X) f 0 (X) H 1 λ (2.17) ˆθ = arg inf D(, X). (2.18) Parameter ρ, threshold λ 0 and randomization probability γ are selected so that the false alarm constraint is satisfied with equality, that is f 0 (X)dX + γ f 0 (X)dX = α, (2.19) A (ρ,λ) B(ρ,λ) but also we need the following equation [ ] ρf 1 (X) inf D(, X) dx + γ A (ρ,λ) B(ρ,λ) [ ] ρf 1 (X) inf D(, X) dx = 0. (2.20) A (ρ, λ) and B(ρ, λ) are the two subsets of R N for which the statistic in (2.17) exceeds and is equal to the threshold λ respectively. Proof: Assume for the moment existence of ρ, λ, γ that solve the system of equations (2.19) and (2.20) and call δ0(x), o δ1(x) o the randomization probabilities associated with the test in (2.17). Consider now any test in the class J α and let us perform the following manipulations δ 1 (X)D(ˆθ, X)q 1 (ˆθ X)dˆθdX ρ δ 1 (X)f 1 (X)dX + λα (2.21) δ 1 (X)D(ˆθ, X)q 1 (ˆθ X)dˆθdX ρ δ 1 (X)f 1 (X)dX + λ δ 1 (X)f 0 (X)dX (2.22) δ 1 (X) inf D(, X)dX ρ δ 1 (X)f 1 (X)dX + λ δ 1 (X)f 0 (X)dX (2.23) [ ] = δ 1 (X) inf D(, X)dX ρf 1(X) + λf 0 (X) dx (2.24) { } min inf D(, X)dX ρf 1(X) + λf 0 (X), 0 dx (2.25) [ ] = δ1(x) o inf D(, X)dX ρf 1(X) dx + λ δ1(x)f o 0 (X)dX (2.26) = λα. (2.27) ntil (2.25) the results are straightforward. Equ. (2.26) expresses the fact that the lower bound is attainable by the proposed detection/estimation scheme of (2.17), (2.18). Finally (2.27) is a consequence of ρ, λ, γ solving the two equations (2.19), (2.20). Comparing (2.21) with (2.27) we conclude that for any test in the class J α we have C ρ with equality whenever the test coincides with the one proposed by the theorem. In order for our proof to be complete we need to show that there exists combination of ρ, λ, γ that solve the two equations. For simplicity we are going to assume that the set B(ρ, λ) has zero probability with respect to f 0 (X) this allows us to select γ = 0.

19 2.3: Examples Examples In this section we present a number of interesting examples by selecting various forms for the cost functions. We basically concentrate on the most well known costs encountered in classical Bayesian estimation theory. We start with the MAP estimate which demonstrates optimality of GLRT MAP Detection/Estimation Consider the following combination of cost functions { C0(, 1 θ) = C1(, 0 θ) = 1; C0(, 0 θ) = C1(, 1 0 θ 1 θ) = 1 otherwise. (2.28) We recall from the classical Bayesian estimation theory (see [15, Page 145]) that, as 0 and assuming sufficient smoothness of the pdf functions, the specific selection of costs leads to the MAP parameter estimation under each main hypothesis. Indeed since 2 D i i (, X) 1 V i f i (X )π i () (2.29) where V i is the volume of a hypersphere of radius (which can be different for each hypothesis if the two parameter vectors are not of the same length). By substituting in (2.10) yields H sup f 1 (X )π 1 () 1 λ V 0 sup f 0 ( X)π 0 () V 1 = λ, (2.30) and the optimum estimator under each hypothesis is the MAP estimator Similarly for the special case of Subsection if we define ˆθ i = arg sup f i (X )π i (). (2.31) { 0 θ 1 C(, θ) = 1 otherwise, (2.32) then D(, X) 1 V f(x )π i () and the optimum test in (2.12) takes the form sup f(x )π() f(x 0) H 1 λ V = λ, (2.33) with the optimum estimator being ˆθ = arg sup f(x )π(). In both tests (2.30) and (2.33) the threshold λ (and the corresponding randomization probabilities) are selected to satisfy the false alarm constraint with equality. If the prior probabilities π i (θ i ), π(θ) are unknown and are replaced by the uniform we obtain the classical form of GLRT. If we now consider the discrete version of the problem and assume that θ i can take only upon a finite set of values V i = {θ i 1,..., θ iki } with corresponding prior probabilities π i1,..., π iki then we recover the GLRT MMSE Detection/Estimation Let us now develop the first test that can be used as an alternative to GLRT. Consider the following costs C 1 0(, θ) = C 1 0(θ); C 0 1(, θ) = C 0 1(θ); C 0 0(, θ) = C 1 1(, θ) = θ 2, (2.34) where C 1 0(θ), C 0 1(θ) are functions to be specified in the sequel. Due to the previous selection, the estimation part is independent from the detection. nder each main hypothesis the optimum estimator is selected by minimizing the 2 The approximate equality becomes exact as 0.

20 16 2: Joint Hypothesis Testing and Estimation corresponding mean square error. Consequently the optimum estimator is the conditional mean of the parameter vector given the data vector X (see [15, Page 143]). Specifically we have θi f i (X θ i )π i (θ i ) dθ i ˆθ i = E[θ i X, H i ] =, i = 0, 1. (2.35) fi (X θ i )π i (θ i ) dθ i The corresponding optimum test after substituting in (2.10) takes the form A 1 (X) H 1 λa 0 (X) (2.36) where A 0 (X) = ˆθ 0 2 f 0 (X) + [C1(θ 0 0 ) θ 0 2 ]f 0 (X θ 0 )π 0 (θ 0 ) dθ 0 A 1 (X) = ˆθ 1 2 f 1 (X) + [C0(θ 1 1 ) θ 1 2 ]f 1 (X θ 1 )π 1 (θ 1 ) dθ 1 f i (X) = f i (X θ i )π i (θ i ) dθ i. (2.37) Selecting C 1 0(θ 1 ) = θ 1 2 and C 0 1(θ 0 ) = θ 0 2 simplifies the test considerably yielding ˆθ 1 2 ˆθ 0 2 H f1 (X θ 1 )π 1 (θ 1 ) dθ 1 1 λ. (2.38) f0 (X θ 0 )π 1 (θ 0 ) dθ 0 We recognize in the second ratio the statistic that is used to decide optimally between the two main hypotheses. By including the first ratio of the two norm square estimates the test performs simultaneous optimum detection and estimation. For the special case of Subsection 2.2.1, it is easy to verify that the corresponding test takes the form ˆθ 2 f(x θ)π(θ) dθ f(x 0) where ˆθ = E[θ X, H 1 ] = θf(x θ)π(θ)dθ/ f(x θ)π(θ)dθ. H 1 λ, (2.39) In both tests in (2.36) and (2.39), if the priors are not known and are replaced by uniforms, we obtain tests that are the equivalent of GLRT for the MMSE criterion Median Detection/Estimation As our final example we present the case of the median estimation where θ i, ˆθ i, θ, are scalars and we select the cost functions as follows C 1 0(, θ) = C 1 0(θ); C 0 1(, θ) = C 0 1(θ); C 0 0(, θ) = C 1 1(, θ) = θ. (2.40) As in the previous examples the estimation part is independent from detection and under each hypothesis it is the optimum Bayes estimator. Consequently for this cost function the optimum estimator is the conditional median [15, Page 143] { y ˆθ i = arg y : P(θ i y X, H i ) = f } i(x θ i )π i (θ i ) dθ i = 1, i = 0, 1. (2.41) fi (X θ i )π i (θ i ) dθ i 2 The optimum test, as before, becomes A 1 (X) H 1 λa 0 (X) (2.42)

Optimum Joint Detection and Estimation

20 IEEE International Symposium on Information Theory Proceedings Optimum Joint Detection and Estimation George V. Moustakides Department of Electrical and Computer Engineering University of Patras, 26500