Optimum Joint Detection and Estimation

Size: px
Start display at page:

Download "Optimum Joint Detection and Estimation"

Transcription

1 Report SSP : Optimum Joint Detection and Estimation George V. Moustakides Statistical Signal Processing Group Department of Electrical & Computer Engineering niversity of Patras, GREECE

2

3 Contents 1 Joint Hypothesis Testing and Isolation Introduction Randomized Decision Rules and Classical Hypothesis Testing Neyman-Pearson Binary Hypothesis Testing Bayesian Multiple Hypothesis Testing Combined Hypothesis Testing and Isolation Optimality of GLRT Combined Neyman-Pearson and Bayesian Hypothesis Testing 8 2 Joint Hypothesis Testing and Estimation Introduction Optimum Bayesian Estimation Combined Neyman-Pearson Hypothesis Testing and Bayesian Estimation Variations Known Parameters under Conditional Cost Examples MAP Detection/Estimation MMSE Detection/Estimation Median Detection/Estimation Conclusion Acknowledgment 17 i

4 ii

5 1 Joint Hypothesis Testing and Isolation 1.1 Introduction In binary hypothesis testing, when hypotheses are composite or the corresponding data pdfs contain unknown parameters, one can use the well known generalized likelihood ratio test (GLRT) to reach a decision. This test has the very desirable characteristic of performing simultaneous detection and estimation in the case of parameterized pdfs or combined detection and isolation in the case of composite hypotheses. Although GLRT is known for many years and has been the decision tool in numerous applications, only asymptotic optimality results are currently available to support it. In this work we introduce a novel, finite sample size, detection/estimation formulation for the problem of hypothesis testing with unknown parameters and a corresponding detection/isolation setup for the case of composite hypotheses. The resulting optimum scheme has a GLRT-like form which is closely related to the criterion we employ for the parameter estimation or isolation part. When this criterion is selected in a very specific way we recover the well known GLRT of the literature while we obtain interesting novel tests with alternative criteria. Our mathematical derivations are surprisingly simple considering they solve a problem that has been open for more than half a century. Consider a random data vector X R N and two composite hypotheses, H 1 defined as H i : X f ik (X) with prior probability π ik, k = 1,..., K i, i = 0, 1, (1.1) where f ik (X) are pdf functions and means distributed according to. nder each hypothesis H i the data pdf can take one out of the K i possible forms f i1 (X),..., f iki (X) with corresponding prior probabilities π i1,..., π iki. The classical approach for distinguishing between the two composite hypotheses consists in forming, for each hypothesis, the mixture pdf K i f i (X) = π ik f ik (X), (1.2) and then, for any realization X of the random vector X, applying the likelihood ratio test k=1 K1 f 1 (X) f 0 (X) = k=1 π H 1kf 1k (X) 1 K0 k=1 π λ, (1.3) 0kf 0k (X) to make a decision. According to (1.3) we decide in favor of H 1 when the likelihood ratio exceeds the threshold λ; in favor of when the likelihood ratio falls below the threshold and perform a randomized decision between the two possibilities every time the likelihood ratio coincides with the threshold. Even though this decision scheme is optimum (in more than one senses), it can decide only between the two main hypotheses. There are clearly applications where one is interested in specifying the actual pdf that generates the data vector X. In other words in addition to the main hypothesis we could also attempt to fine-tune our decision mechanism by isolating the pdf that is responsible for the observed data X. This goal clearly demands for a joined detection/isolation strategy. A possible approach for solving the combined problem is with the help of GLRT, that 1

6 2 1: Joint Hypothesis Testing and Isolation is, by applying the following test which is equivalent to max f 1k (X) H 1 1 k K 1 λ, (1.4) max f 0k (X) 1 k K 0 H f 1ˆk1 (X) 1 λ f 0ˆk0 (X) (1.5) ˆk i = arg max f ik (X), i = 0, 1. 1 k K i (1.6) We observe that GLRT performs two simultaneous decisions: with (1.5) it decides between the two main hypotheses, H 1 and at the same time, with (1.6) it isolates the most likely pdf under each hypothesis. A significantly more interesting situation arises when under each hypothesis we have parameterized pdfs. Suppose that under hypothesis H i, i = 0, 1 the data vector satisfies X f i (X θ i ) where for the parameter vector θ i we assume that it is a realization of a corresponding random vector ϑ i which is distributed according to the prior pdf π i (θ i ). A test for composite hypotheses would form the two mixture pdfs f i (X) = f i (X θ i )π i (θ i )dθ i and then apply the likelihood ratio on the resulting densities. Again, as before, this approach is unable to propose an estimate for the parameter vector θ i that generates the observed data X. We realize that the isolation problem has now turned into a parameter estimation problem consequently, if our goal is to make, simultaneously, detection and parameter estimation, a possibility could be to apply the GLRT or equivalently H sup θ1 f 1 (X θ 1 ) 1 λ, (1.7) sup θ0 f 0 (X θ 0 ) H f 1 (X ˆθ 1 ) 1 λ f 0 (X ˆθ 0 ) (1.8) ˆθ i = arg sup θ i f i (X θ i ), i = 0, 1. (1.9) With this test we decide between the two hypotheses providing, at the same time through (1.9), maximum likelihood estimates of the desired parameters. The first asymptotic optimality result for GLRT can be traced back to 1943 in the work of Wald [1] while subsequent results can be found in [2, 3, 4, 5]. A thorough analysis of this subject exists in [6, Chapter 22] and additional references in [7]. We should also mention a series of results [8]-[13] addressing the asymptotic optimality property of GLRT but for special classes of processes. Finally in [14] GLRT is related to the uniformly most powerful invariant (MPI) test and conclusions about its asymptotic optimality are drawn from this connection. As far as applications are concerned, the literature dealing with GLRT is enormous, indicating the significant practical usefulness of this very simple decision mechanism. Despite GLRT s extreme popularity, no finite sample size optimality result has been developed so far to support it. It is exactly this gap we intend to fill with our current work. Of course, it is unrealistic to expect that GLRT will turn out to be finite-sample-size-optimum with respect to some known criterion. The only chance we have to prove such type of optimality is by introducing a new performance measure. The measure we intend to adopt, we believe, makes a lot of sense and it is tailored to the fact that GLRT performs simultaneous detection/isolation or detection/estimation. Furthermore, with our analysis we will not only provide the missing optimality theory for GLRT but we will also offer novel GLRT-like alternatives which might turn out to be more suitable for certain applications than the existing test. 1.2 Randomized Decision Rules and Classical Hypothesis Testing Before introducing our main results let us first revisit two classical problems from hypothesis testing theory, namely binary hypothesis testing in the Neyman-Pearson sense and multiple hypothesis testing in the Bayesian sense. We would like to develop the corresponding familiar optimum detection strategies by working with the

7 1.2: Randomized Decision Rules and Classical Hypothesis Testing 3 class of randomized decision rules. Randomized tests tend to be easier to optimize than their deterministic counterparts because they involve optimization of functions as opposed to optimization of (decision) sets needed in deterministic tests. The reason we insist on the two classical hypothesis testing problems is because we intend to propose a new combined version that will produce GLRT in a natural way. Furthermore, as we mentioned, we pay special attention to the class of randomized tests instead of the conventional deterministic class because with the former it is straightforward to develop the desired optimum decision strategy Neyman-Pearson Binary Hypothesis Testing Consider a random data vector X that takes values in R N and two hypotheses : X f 0 (X); H 1 : X f 1 (X), where f i (X) denotes the pdf of the data vector X under hypothesis H i. For every realization X we must come up with a decision D {0, 1}. Given X, with a randomized decision rule our decision d is a random variable. Therefore let δ 0 (X), δ 1 (X) denote the probability of our decision D being 0 and 1 respectively. It is clear that the two probabilities must be complementary, i.e. δ 0 (X) + δ 1 (X) = 1 and functions of the observation vector X. A randomized decision rule is completely specified once these two functions are known. A decision D is reached with the help of a random selection game where we select D = 0 with probability δ 0 (X) and D = 1 with probability δ 1 (X) using, for example, an unfair coin tossing procedure. The class of randomized decision rules is richer than the class of deterministic strategies. Indeed, we recall that a deterministic strategy is defined with the help of two complementary sets A 0, A 1 R N, where A 1 = A c 0 and superscript c denotes complement, and we decide in favor of H j whenever X A j, j = 0, 1. Deterministic strategies make always the same decision for the same data vector X unlike their randomized counterparts where the decision depends on the outcome of the random game. A deterministic strategy can be viewed as a randomized rule by selecting δ j (X) = 1 {Aj}(X) where 1 {A} (X) is the indicator function of the set A. Note that whenever X A j the deterministic rule selects H j ; its randomized version on the other hand selects H j with probability δ j (X) = 1 {Aj}(X) = 1 which, of course, is the equivalent of a deterministic decision. The advantage of using randomized rules is that we work with functions instead of sets (which is the practice with deterministic strategies). This facilitates considerably the understanding of proofs by a novice reader who is more familiar with function than set optimization. Let us now attempt to solve the binary hypothesis testing problem in the sense of Neyman-Pearson. We are seeking a randomized rule [δ 0 (X), δ 1 (X)] that maximizes the probability of detection P(D = 1 H 1 ) subject to the constraint that the false alarm probability P(D = 1 ] does not exceed a prescribed level α [0, 1]. We can immediately see that P(D = j H i ) = δ j (X)f i (X)dX. (1.10) sing the Lagrange multiplier technique, we can transform the constrained optimization problem into an unconstrained one as follows { } δ 1 (X)[f 1 (X) λf 0 (X)]dX, (1.11) max δ 1(X) where λ 0 the Lagrange multiplier. Since 0 δ 1 (X) 1 (we recall that δ 1 (X) is a probability) we conclude that the optimum δ1(x) o is 1 when f 1 (X) λf 0 (X) > 0 δ1(x) o = γ(x) when f 1 (X) λf 0 (X) = 0 (1.12) 0 when f 1 (X) λf 0 (X) < 0, where γ(x) is any arbitrary probability. This rule is of course equivalent to the classical likelihood ratio test of selecting with probability 1 (therefore deterministically) H 1 when f 1 (X)/f 0 (X) > λ; favoring when f 1 (X)/f 0 (X) < λ and deciding randomly with probability γ(x) in favor of H 1 (and therefore with probability 1 γ(x) in favor of ) whenever the likelihood ratio coincides with the threshold λ. Threshold λ and randomization probability γ(x) are selected so that the likelihood ratio test meets the false alarm constraint with equality. The proof of existence of suitable values for λ and γ(x) (the latter is usually set to a constant) for any level α [0, 1] and of the optimality of the resulting test can be found in any basic textbook on hypothesis testing (see for example [15, Page 22]). We observe that under the richer class of randomized rules we still obtain the classical likelihood ratio test as our optimum detection scheme. It should be noted that although randomization does not improve the optimum

8 4 1: Joint Hypothesis Testing and Isolation (deterministic) rule, this is not necessarily the case when randomization is applied to suboptimum tests (see for example [16] where the introduction of noise transforms a deterministic test into a randomized one and improves performance) Bayesian Multiple Hypothesis Testing Consider now the case where the random data vector X satisfies K hypotheses of the form H k : X f k (X) with corresponding prior probability π k where k = 1,..., K. Here decision d takes values in the set {1,..., K} while the randomized decision mechanism is comprised of K complementary probabilities δ 1 (X),..., δ K (X), with δ l (X) 0; δ 1 (X)+ +δ K (X) = 1 and δ l (X) denoting the probability of selecting D = l, using a random selection game. For a Bayesian formulation we also need to specify a collection of costs C k l, k, l = 1,..., K, where Ck l expresses the cost of deciding in favor of H l (i.e. D = l) when the true hypothesis is H k. The goal is to select the randomized decision strategy, namely the probabilities δ l (X), in order to minimize the average cost. If we denote the latter by C and recall (1.10), we can write K K K K C = Cl k P(D = l & H k ) = Cl k P(D = l H k )π k (1.13) = = l=1 k=1 K l=1 K l=1 where the functions D l (X) are defined as l=1 k=1 { K } δ l (X) Cl k f k (X)π k dx = k=1 δ l (X) min l D l (X)dX = K l=1 δ l (X)D l (X)dX (1.14) { K } min D l (X) δ l (X) dx (1.15) l min l D l (X)dX, (1.16) D l (X) = l=1 K Cl k f k (X)π k. (1.17) k=1 In the previous derivations, inequality (1.15) is true because δ l (X) 0, while (1.16) is a consequence of the same functions being complementary. The final integral in (1.16) is independent from the decision strategy, therefore it constitutes a lower bound to the performance of any randomized rule. Furthermore this lower bound is always attainable by the following decision rule which is thereby optimum { δk(x) o 1 when k = arg minl D = l (X) 0 otherwise. (1.18) The previous relation is the randomized version of the well known (deterministic) Bayesian optimum decision strategy D = arg min D l(x). (1.19) 1 l K Clearly if more than one indexes attain the same minimum then we randomize among them with arbitrary complementary probabilities. We also recall the very interesting special case Cl k = 1 when l k and Cl l = 0, for which the average costs C becomes the probability of making an erroneous decision. For this case the decision rule (1.19) is equivalent to D = arg max π π l f l (X) lf l (X) = arg max 1 l K 1 l K K k=1 π kf k (X). (1.20) In other words we select the hypothesis with the maximum aposteriori probability (MAP). Again, we observe that we obtain the classical optimum detection scheme of the deterministic setup. In the next section we are going to combine the previous two results and propose a new performance measure which will be optimized by GLRT.

9 1.3: Combined Hypothesis Testing and Isolation Combined Hypothesis Testing and Isolation Let us return to the binary case and assume that each hypothesis is composite. In other words under each hypothesis we have more than one possible data pdfs with a known prior probability. For notational simplicity we are going to regard each such possibility as a different subhypothesis. Therefore we are going to say that is comprised of the subhypotheses k, k = 1,..., K 0, where under k : X f 0k (X) with a prior probability π 0k. Similarly H 1 has the sybhypotheses H 1k, k = 1,..., K 1, where under H 1k : X f 1k (X) with prior probability π 1k. Probabilities π ik, k = 1,..., K i, i = 0, 1, are the prior probabilities of the subhypotheses given that the main hypothesis H i is true. Consequently π π 0K0 = π π 1K1 = 1. If we simply like to decide between and H 1 then, as was mentioned in the Introduction, we apply the test depicted in (1.3). If however our goal is, in addition to this decision, to isolate the specific subhypothesis which is responsible for the observed data vector X, then we need to formulate the problem differently. Note that a randomized rule capable of selecting between subhypotheses requires the definition of K 0 + K 1 complementary probabilities δ 01 (X),..., δ 0K0 (X), δ 11 (X),..., δ 1K1 (X) (1.21) where for the randomization probabilities δ jl (X), j = 0, 1 and l = 1,..., K j, we have δ jl (X) 0 and where [δ 01 (X) + + δ 0K0 (X)] + [δ 11 (X) + + δ 1K1 (X)] = 1. (1.22) A key point in developing our methodology consists in observing that it is possible to write δ jl (X) = δ j (X)q jl (X), (1.23) δ j (X) = δ j1 (X) + + δ jkj (X) and q jl (X) = δ jl(x) δ j (X), j = 0, 1; l = 1,..., K j. (1.24) This alternative form of the randomization probabilities involves the following set of functions δ 0 (X), δ 1 (X), q 01 (X),..., q 0K0 (X), q 11 (X),..., q 1K1 (X) (1.25) for which, because of (1.22), (1.23), (1.24), we have δ 0 (X) + δ 1 (X) = q 01 (X) + + q 0K0 (X) = q 11 (X) + + q 1K1 (X) = 1. (1.26) Actually δ j (X), j = 0, 1 expresses to the total randomization probability of selecting hypothesis H j whereas q jl (X) becomes the conditional probability of selecting subhypothesis H jl given that we have selected the main hypothesis H j. The two different sets of randomization probabilities depicted in (1.21) and (1.25) suggest two different randomized games for the combined detection/isolation problem. With the help of the probabilities δ jl (X) in (1.21), the decision mechanism involves a single step which directly selects a specific subhypothesis. In other words we simultaneously detect and isolate. This approach is similar to the multiple hypothesis testing problem considered previously. Now, if we use the alternative set in (1.25) then the detection/isolation process is concluded in two steps since it involves two different decisions, namely d 1 for detection and d 2 for isolation. Specifically: Step 1: We first make a decision d 1 {0, 1} using the randomization probabilities δ 0 (X), δ 1 (X) and decide between the two main hypotheses, H 1. Step 2: Given that in the first step we decided D 1 = j, that is, in favor of the main hypothesis H j, we continue with the isolation part and we select d 2 {1,..., K j } using the randomization probabilities q jl (X), thus isolating one of the subhypotheses H jl. The second randomized decision must be (conditionally on X) independent from the one applied in the first step. The fact that in Step 2 the randomized selection game is independent from Step 1, allows for the writing of the probabilities δ jl (X) in the product form appearing in (1.23). We would like to emphasize that the two randomized decision procedures, that is, the first based on (1.21) and the second using (1.25) are perfectly equivalent. Indeed from (1.21) we obtain (1.25) by applying (1.24) while we obtain (1.21) from (1.25) by using (1.23). The basic difference between the two decision strategies is that the

10 6 1: Joint Hypothesis Testing and Isolation second method respects the grouping of the subhypotheses while the first disregards this property completely. It is in fact this grouping of the second decision mechanism that will give rise to the desired test. We should also mention that it is not equally straightforward to come up with the alternative decision mechanism by working solely with deterministic instead of randomized tests. Consequently, this fact justifies the use of this larger class of rules Optimality of GLRT Let us demonstrate the usefulness of the alternative decision mechanism presented above by introducing a simple detection/isolation problem which leads directly to the optimality of the classical GLRT. For our two-step decision process, consider the two probabilities P(Correct-detection/isolation H 1 ) and P(Miss-detection/isolation ). Following a Neyman-Pearson approach we are interested in maximizing P(Correct-detection/isolation H 1 ) subject to the constraint that P(Miss-detection/isolation ) is no larger than a prescribed level. The following theorem addresses explicitly this problem and introduces the corresponding optimum solution. Theorem 1.1: Consider the class J α of all detection/isolation tests that satisfy the constraint where α min α 1, with P(Miss-detection/isolation ) α, (1.27) α min = 1 max {π 0k f 0k (X)}dX. (1.28) 1 k K 0 Then the test, within the class J α, that maximizes the probability P(Correct-detection/isolation H 1 ) is given by: Step 1: The optimum strategy for deciding between the two main hypotheses and H 1 is the GLRT max {π 1k f 1k (X)} H 1 1 k K 1 λ (1.29) max {π 0k f 0k (X)} 1 k K 0 where, whenever the left hand side coincides with the threshold we perform a randomization between the two hypotheses and select H 1 with probability γ. Step 2: If in Step 1 we decide in favor of hypothesis H i (i.e. D 1 = i) then the optimum isolation strategy is D 2 = arg max {π ik f ik (X)}. (1.30) 1 k K i If more than one indexes attain the same maximum we perform an arbitrary randomization among them. Threshold λ and randomization probability γ of Step 1 must be selected so that the constraint in (1.27) is satisfied with equality. Proof: Note that P(Miss-detection/isolation ) = 1 P(Correct-detection/isolation ), therefore the constraint is equivalent to P(Correct-detection/isolation ) 1 α. Furthermore with K i P(Correct-detection/isolation H i ) = P(Correct-detection/isolation H ik )π ik (1.31) k=1 P(Correct-detection/isolation H ik ) = δ i (X)q ik (X)f ik (X)dX. (1.32) To solve the constrained optimization problem, let λ 0 be a Lagrange multiplier and, as in the classical Neyman- Pearson case, define the corresponding unconstrained version. With the help of (1.31) and (1.32) we can write P(Correct-detection/isolation H 1 ) + λ P(Correct-detection/isolation ) { K1 } { K0 } = δ 1 (X) q 1k (X)π 1k f 1k (X) dx + λ δ 0 (X) q 0k (X)π 0k f 0k (X) dx (1.33) k=1 δ 1 (X) max {π 1k f 1k (X)} dx + λ δ 0 (X) max {π 0k f 0k (X)} dx (1.34) 1 k K 1 1 k K 0 [ ] = δ 1 (X) max {π 1k f 1k (X)} + δ 0 (X)λ max {π 0k f 0k (X)} dx (1.35) 1 k K 1 1 k K 0 { } max max {π 1k f 1k (X)}, λ max {π 0k f 0k (X)} dx. (1.36) 1 k K 1 1 k K 0 k=1

11 1.3: Combined Hypothesis Testing and Isolation 7 Inequality (1.34) is valid because the functions q ik (X), k = 1,..., K i are nonnegative and complementary and (1.36) is true because the same properties hold for δ i (X), i = 0, 1. Note that the final expression constitutes an upper bound on the performance of any detection/isolation rule. Furthermore this upper bound is attainable by a specific detection/isolation strategy. Indeed we note that we have equality in (1.34) when the isolation probabilities are selected as { qik(x) o 1 if k = arg min1 l Ki {π = il f il (X)} (1.37) 0 otherwise, and we randomize if there are more than one indexes attaining the same maximum. This optimum isolation process is the randomized equivalent of (1.30). Similarly we have equality in (1.36) when we select the detection probabilities to be 1 if max 1 j K1 {π 1j f 1j (X)} λ max 1 j K0 {π 0j f 0j (X)} δ1(x) o = γ if max 1 j K1 {π 1j f 1j (X)} = λ max 1 j K0 {π 0j f 0j (X)} (1.38) 0 otherwise, and δ0(x) o = 1 δ1(x). o Clearly this optimum detection procedure is the equivalent of (1.29). As far as the false alarm constraint is concerned let us define the following sets { A (λ) = X : max } 1 j K 1 {π 1j f 1j (X)} max 1 j K0 {π 0j f 0j (X)} > λ { B(λ) = X : max } 1 j K 1 {π 1j f 1j (X)} max 1 j K0 {π 0j f 0j (X)} = λ. For the test introduced above, we can then write that P(Miss-detection/isolation ) = 1 max {π 0j f 0j (X)} dx γ A (λ) 1 j K 0 1 max {π 0j f 0j (X)} dx A (λ) B(λ) 1 j K 0 1 max {π 0j f 0j (X)} dx = α min. 1 j K 0 B(λ) max {π 0j f 0j (X)} dx 1 j K 0 (1.39) (1.40) The lower bound α min is clearly attainable in the limit by selecting γ = 1 and letting λ 0. Also the missdetection/isolation probability is bounded from above by 1 and we can see that this value can also be attained in the limit by selecting γ = 0 and letting λ. Existence of a suitable threshold λ and a randomization probability γ that assure validity of the false alarm constraint with equality, as well as, optimality of the resulting test in the desired sense, can be easily demonstrated following exactly the same steps as in the classical Neyman- Pearson case 1. This concludes the proof. We realize that in order to apply the test in (1.29) we need knowledge of the prior probabilities π ik. Whenever this information is not available we can consider equiprobable subhypotheses under each main hypothesis and select π ik = 1/K i. nder this assumption the optimum test in (1.29) is reduced into the classical form of GLRT depicted in (1.5) (after absorbing the two prior probabilities inside the threshold). Finally, we should mention that if hypothesis is simple or, if under hypothesis we are not interested in the isolation problem (therefore we can treat it as simple by forming the mixture density) then P(Miss-detection/isolation ) becomes the usual false alarm probability with corresponding α min = 0. In other words the false alarm probability can take any value in the interval [0, 1] as in the classical Neyman-Pearson case. Remark 1.1: We observe that the optimum test, under each main hypothesis, selects the most appropriate subhypothesis with the help of a MAP isolation rule, exactly as in (1.20). The interesting point is that this selection is performed independently from the other hypothesis and from the corresponding detection strategy. This is clearly a very desirable property since it separates the isolation from the detection problem. In our developments we are going to obtain suitable conditions that can guarantee the same characteristic for the extended detection/isolation problem introduced next. 1 In the proof we simply replace the pdfs f i (X) with the functions max 1 j Ki {π ik f ik (X)}. Even though these functions are not densities, the proof goes through without change.

12 8 1: Joint Hypothesis Testing and Isolation Combined Neyman-Pearson and Bayesian Hypothesis Testing The previous results are directly extendable to a more general formulation where we impose costs on combinations of decisions and (sub)hypotheses. We should however emphasize that we are interested in preserving the grouping of the two sets of subhypotheses defined in the previous subsection, since this is the key idea that produces the GLRT. Therefore suppose that Cjl ik denotes the cost of deciding in favor of subhypothesis H jl (i.e. D 1 = j, D 2 = l) when the true subhypothesis is H ik. For the indexes we have i, j {0, 1} while k {1,..., K i } and l {1,..., K j }. Let us now consider the average cost C i given that the main hypothesis H i, i = 0, 1, is true. We have C i = = K i K 1 j Cjl ik P(D 1 = j & D 2 = l & H ik H i ) k=1 j=0 l=1 ( K i K0 C ik k=1 l=1 K 1 0l P(D 1 = 0 & D 2 = l H ik ) + l=1 C ik 1l P(D 1 = 1 & D 2 = l H ik ) { } K 0 K 1 = δ 0 (X) q 0l (X)D0l(X) i + δ 1 (X) q 1l (X)D1l(X) i dx, k=1 k=1 ) π ik (1.41) where we define D i jl (X) = K i k=1 Cik jl f ik(x)π ik. By following a Neyman-Pearson like approach we propose to minimize C 1 under the constraint that C 0 does not exceed some prescribed value. As we realize, within each main hypothesis we employ a Bayesian formulation, whereas across main hypotheses, the formulation is of Neyman-Pearson type. With this specific setup we maintain the required grouping of subhypotheses mentioned before, a fact that will produce alternative to GLRT schemes. In the next theorem we define explicitly the optimization problem of interest and offer the corresponding general optimum solution. Theorem 1.2: Consider the class J α of detection/isolation tests that satisfy C 0 α, then the test that minimizes the cost C 1 within the class J α is given by D 1 0ˆl 0 (X) D 1 1ˆl 1 (X) with the corresponding isolation process satisfying H 1 [ ] λ D 0 (X) D 0 (X), (1.42) 1ˆl 1 0ˆl 0 ˆlj = arg min [Djl(X) 1 + λdjl(x)], 0 j = 0, 1. (1.43) 1 l K j Threshold λ 0 and the randomization probability γ are selected so that the resulting test satisfies the constraint with equality. Proof: Consider the unconstrained problem of minimizing C 1 + λc 0 where λ 0 a Lagrange multiplier. We can then write C 1 + λc 0 { K 0 = δ 0 (X) q 0l (X) [ D0l(X) 1 + λd0l(x) 0 ] K 1 + δ 1 (X) q 1l (X) [ D1l(X) 1 + λd1l(x) 0 ]} dx (1.44) k=1 { { δ 0 (X) min D 1 0l (X) + λd0l(x) 0 } { + δ 1 (X) min D 1 1l (X) + λd1l(x) 0 } } dx (1.45) 1 l K 0 1 l K 1 { [ ] [ ]} = δ 0 (X) D 1 (X) + λd 0 (X) + δ 0ˆl 0 0ˆl 1 (X) D 1 (X) + λd 0 (X) dx (1.46) 0 1ˆl 1 1ˆl 1 { } min D 1 (X) + λd 0 (X), D 1 (X) + λd 0 (X) dx. (1.47) 0ˆl 0 0ˆl 0 1ˆl 1 1ˆl 1 We have equality in (1.45) whenever the isolation procedure satisfies (1.43) and equality in (1.47) whenever detection is according to (1.42). If threshold λ and randomization probability γ are such that the false alarm constraint is satisfied with equality, it is then straightforward to show that the corresponding combined scheme is indeed optimum in the sense that it minimizes C 1 within the class J α. This concludes the proof. k=1

13 1.3: Combined Hypothesis Testing and Isolation 9 Remark 1.2: Regarding the allowable values of level α we have that α min α α max. nder this general setting it is possible to find an expression only for the lower end α min. It is easily seen that Z j ff C 0 min min D0l(X), 0 min D1l(X) 0 dx = α min. (1.48) 1 l K 0 1 l K 1 Furthermore, this value α min is attainable by the optimum test as one can verify by letting λ. nfortunately we cannot obtain a similar expression for the upper limit α max of C 0, since it is not clear whether the cost C 0 (λ) of the optimum scheme is a monotone function of λ. Of course we can always say that α max = sup 0 λ C 0 (λ), but the practical usefulness of this conclusion is minimal. Remark 1.3: From (1.43) we understand that the isolation process under each hypothesis (expressed through the corresponding minimization) takes into account the statistics of the other hypothesis and also depends on the detection rule through the threshold λ. We recall that in GLRT this is not the case, since we simply use a MAP selection that neither depends on the other hypothesis nor on the threshold λ. In order to obtain the same desirable property under this more general setup it is sufficient to assume that 2 This, in turn, yields making (1.42) equivalent to C 1k 0l while the isolation process (1.43) simplifies to = C0 1k ; C1l 0k = C1 0k. (1.49) D 1 0l(X) = D 1 0 (X); D 0 1l(X) = D 0 1 (X) (1.50) H 1 [ D0 1 (X) min D1l(X) 1 λ 1 l K 1 D1 0 (X) ˆli = arg min 1 l K i D j il ] min D0l(X) 0 1 l K 0, (1.51) (X). (1.52) With the conditions in (1.49) the isolation process simplifies considerably since under hypothesis H i it involves only the Bayes cost Cil ik, in other words the cost that we would use if we had only the isolation problem exactly as in Subsection Consequently isolation under each hypothesis becomes independent from the isolation of the other hypothesis and also independent from the detection process, thus matching the property observed in GLRT. Although it is possible to offer several intriguing examples for our general problem, it seems more interesting to postpone this presentation until Section 2.3 after we consider the problem of combined detection/estimation. 2 In GLRT this property holds since C 1k 0l = C 0k 1l = 1.

14 2 Joint Hypothesis Testing and Estimation 2.1 Introduction A vastly more interesting problem arises when we combine hypothesis testing with parameter estimation. Therefore, suppose that under H i, i = 0, 1 the corresponding data pdf have the form f i (X θ i ) where θ i are parameters with prior pdf π i (θ i ). As mentioned in the Introduction, if we simply desire to discriminate between and H 1 then we can form the mixture pdfs f i (X) = f i (X θ i )π i (θ i )dθ i and apply the likelihood ratio test. When however our goal is to perform simultaneous detection and parameter estimation, then we need to develop techniques that are similar to the ones presented in the previous section and in particular Subsection Before proceeding with this extension let us first discuss the notion of a randomized estimator by revisiting the problem of optimum Bayesian estimation Optimum Bayesian Estimation As in hypothesis testing, let X R N be a random data vector which is distributed according to a pdf f(x θ). For θ we assume that it is a realization of a random parameter vector ϑ for which we have available a known prior pdf π(θ). Given a realization X of the data vector, we would like to come up with a parameter estimate ˆθ. Following a Bayesian approach if θ is the true parameter vector and ˆθ the corresponding estimate this generates a cost C(ˆθ, θ). Our goal is to propose an estimation strategy which minimizes the average cost. This problem is very similar to the Bayesian multiple hypothesis testing problem treated in Subsection We recall that in hypothesis testing there was a finite number of hypotheses and an equal number of possible decisions (selections). Here, loosely speaking, each possible value of θ corresponds to a possible hypothesis, consequently our decision ˆθ and the true parameter vector θ can take a continuum of values. We also recall that in the case of finite possibilities a randomized decision rule was defined with the help of a corresponding finite set of complementary probabilities δ l (X). If we like to extend this idea, we need to assign to each possible selection ˆθ a probability which is a function of X. Since ˆθ takes a continuum of values, to each ˆθ we can assign, in principle, a differential probability δ(ˆθ X)dˆθ. This suggests that the equivalent of the probabilities δ l (X) is now a probability density function δ(ˆθ X), that is, a function that satisfies δ(ˆθ X) 0 and δ(ˆθ X)dˆθ = 1. Clearly the notion of randomized estimator is not new. Bayesian approaches make use of such entities as one can verify by consulting [17, Page 65]. The posterior parameter pdf given the data X constitutes the most common randomized estimator used in practice. Here however we need the general definition where any parameter pdf can play the role of an estimator. As it becomes clear from the previous discussion, a randomized estimator is completely specified if we define the pdf δ(ˆθ X). At this point it becomes interesting to mention how we can produce an actual estimate ˆθ. We recall that in the previous section our decision was the outcome of a random selection game. Following the same idea here, we need to generate a realization of a random variable distributed according to δ(ˆθ X). This realization becomes our estimate! Although randomized estimates might seem even more awkward than randomized decisions, they nevertheless constitute their natural extension. Despite the seemingly counter-intuitive form of the proposed estimation mechanism, we must point out that randomized estimators unify the two problems of hypothesis testing and estimation in a straightforward manner. 10

15 2.1: Introduction 11 Indeed, as we will be able to verify shortly, we obtain the corresponding optimum schemes by applying exactly the same methodology. Finally we should also add that the class of randomized estimators is richer than the class of their deterministic counterparts. This is because any deterministic estimator of the form ˆθ = G(X), where G(X) is a deterministic function of X, can be modeled as a randomized estimator having the pdf δ(ˆθ X) = Dirac(ˆθ G(X)). In other words the pdf assigns all its probability mass to the selection ˆθ = G(X). Let us now look for the optimum estimator within the class of randomized estimators that minimizes the expected cost. If we call the latter C we can write C = C(ˆθ, θ)δ(ˆθ X)f(X θ)π(θ)dθdˆθdx [ { } ] [ ] = δ(ˆθ X) C(ˆθ, θ)f(x θ)π(θ)dθ dˆθ dx = δ(ˆθ X)D(ˆθ, X)dˆθ dx [ { } ] [ ] (2.1) δ(ˆθ X) inf D(, X) dˆθ dx = inf D(, X) δ(ˆθ X)dˆθ dx = inf D(, X)dX, where we defined D(, X) = C(, θ)f(x θ)π(θ)dθ. The last integral in (2.1) constitutes a lower bound on the performance of any randomized estimator. This lower bound is attainable if we select ) δ(ˆθ X) = Dirac (ˆθ arg inf D(, X), (2.2) provided that arg inf D(, X) is a usual function 1 of X. It is clear that if the infimum is attained by a single function of X, the resulting optimum estimator is purely deterministic. When however we have more than one choices then we can randomize among them with arbitrary randomization probabilities and the resulting estimator will be randomized. By comparing the previous derivations with Eqs. (1.13)-(1.16) of Subsection we realize that the corresponding steps are completely analogous Combined Neyman-Pearson Hypothesis Testing and Bayesian Estimation In this part we are going to extend the result obtained in Subsection Suppose again that the data vector X under hypothesis H i, i = 0, 1 satisfies X f i (X θ i ) where θ i is a realization of a random parameter vector ϑ i with prior pdf π i (θ i ). When a realization X of X is available we would like to decide between and H 1 and also estimate the corresponding parameter vector. A randomized detection/estimation structure will be comprised of the following set of functions δ 0 (X), δ 1 (X), q 0 (ˆθ 0 X), q 1 (ˆθ 1 X), (2.3) that are the equivalent of (1.25). These functions are nonnegative satisfying δ 0 (X) + δ 1 (X) = q 0 (ˆθ 0 X)dˆθ 0 = q 1 (ˆθ 1 X)dˆθ 1 = 1, (2.4) that corresponds to (1.26). The two probabilities δ j (X) are complementary while the two functions q j (ˆθ j X) are pdfs with respect to ˆθ j. Our randomized detection/estimation strategy involves again two steps. In Step 1 with probabilities δ j (X), j = 0, 1 we decide between the two main hypotheses H j while in Step 2, given that in the previous step the decision was D 1 = j, using the randomized estimator q j (ˆθ j X) we provide a parameter estimate ˆθ j. Let us now develop the equivalent of our results in Subsection This will become our starting point for considering various special cases that will give rise to interesting novel GLR-type tests. Denote with C i j (ˆθ j, θ i ) the cost of providing the parameter estimate ˆθ j after having decided that the main hypothesis is H j, when the true 1 For simplicity we assume that the infimum is in fact a minimum, in other words that there exists (at least one) function ˆθ = G(X) that attains the minimal value. In the opposite case we need to become more technical and introduce the notion of ɛ-optimality with estimation strategies that have performance which is ɛ-close to the optimum.

16 12 2: Joint Hypothesis Testing and Estimation main hypothesis is H i and the corresponding true parameter value is θ i. If C i denotes the average cost given that hypothesis H i is true, then we have the following expression for this quantity, which is the equivalent of (1.41) C i = { δ 0 (X) q 0 (ˆθ 0 X)D0(ˆθ i 0, X)dˆθ 0 + δ 1 (X) q 1 (ˆθ 1 X)D i 1(ˆθ 1, X)dˆθ 1 } dx. (2.5) We define D i j (, X) = C i j (, θ i)f i (X θ i )π i (θ i )dθ i. Consider now the problem of optimizing C 1 among all detection/estimation schemes that satisfy the constraint that C 0 is no larger than a prescribed value. The next theorem defines this problem explicitly and provides the corresponding optimum solution. Theorem 2.1: Consider the class J α of detection/estimation tests that satisfy C 0 α, then the test that minimizes the cost C 1 within the class J α is given by D 1 0 (ˆθ 0, X) D 1 1 (ˆθ 1, X) with the corresponding estimations defined by H 1 [ ] λ D1 0 (ˆθ 1, X) D0 0 (ˆθ 0, X), (2.6) ˆθ j = arg inf [D 1 j (, X) + λd 0 j (, X)], j = 0, 1. (2.7) Proof: The proof is exactly similar to the proof of Theorem 2 with the sums replaced by integrals. Remark 2.1: For the level α we have α min α α max and, as in the discrete case, we have an expression only for the lower bound Z n o α min = min min D 0 0 (, X), min D 1 0 (, X) dx. (2.8) Remark 2.2: Following the same lines of Theorem 2, by assuming C0 1 (, θ) = C0 1 (θ) and C1 0 (, θ) = C1 0 (θ) we obtain D0 1 (, X) = D0 1 (X) and D1 0 (, X) = D1 0 (X). nder this assumption the optimum test in (2.6) simplifies to H 1 h i D0 1 (X) inf D 1 1 (, X) λ D1 0 (X) inf D 0 0 (, X), (2.9) and the optimum parameter estimate becomes ˆθ j = inf D j j (, X). (2.10) The important consequence of this simplification is that the estimation part, under each hypothesis, reduces to the optimum Bayes estimator which is independent from the other hypothesis and the detection rule. 2.2 Variations In this section we are going to present two variations of the same idea that might turn out to be interesting for applications. In both cases the resulting estimator under each hypothesis is the optimum Bayes, exactly as in GLRT. We start with the case where the parameters are known under, a scenario that is quite frequent in practice Known Parameters under Let f(x θ) be a pdf with θ a parameter vector. Suppose that under we have θ = 0 whereas under H 1 vector θ follows a prior pdf π(θ). We would like to test against H 1, but whenever we decide in favor of H 1 we would also like to provide an estimate ˆθ for the corresponding parameter vector θ. Since parameter estimation is needed only under H 1, this suggests that a combined detection/estimation scheme will be comprised of the functions δ 0 (X), δ 1 (X), q 1 (ˆθ X) that satisfy δ j (X) 0, j = 0, 1, q 1 (ˆθ X) 0, δ 0 (X) + δ 1 (X) = q 1 (ˆθ X)dˆθ = 1. The two probabilities δ 0 (X), δ 1 (X) will be used in the first step to decide

17 2.2: Variations 13 between the two main hypotheses, while q 1 (ˆθ X) will be employed in the second step to provide the required estimate for θ, every time we decide in favor of H 1. Regarding the Bayesian cost we define C(ˆθ, θ) to be the cost of providing an estimate ˆθ when the true value is θ. Of course this cost makes sense only under H 1. Consequently if the true hypothesis is H 1 with parameter θ and we decide in favor of H 1 with parameter estimate ˆθ then, as we said, the cost is C(ˆθ, θ). If however we are under H 1 with parameter value is θ and we decide in favor of, then this is like selecting ˆθ = 0. Hence, it makes sense to assign to this event the cost C(0, θ). sing these observations it is straightforward to compute the average cost under H 1 which takes the form C 1 = δ 1 (X)D(ˆθ, X)q 1 (ˆθ X)dˆθdX + δ 0 (X)D(0, X)dX (2.11) with D(, X) = C(, θ)f(x θ)π(θ)dθ. For this special problem we propose to minimize the average cost C 1 under H 1 and at the same time control the false alarm probability under. The next theorem presents explicitly the problem of interest and introduces the corresponding optimum solution. Theorem 4: Consider the class J α of detection/estimation procedures with false alarm probability not exceeding the level α. Then within the class J α the test that minimizes the average cost C 1 is given by D(0, X) inf D(, X) f(x 0) H 1 λ. (2.12) Threshold λ 0 and randomization probability γ are selected so that the false alarm constraint is satisfied with equality. Proof: The false alarm under is given by P(D 1 = 1 ) = δ 1 (X)f(X 0)dX. If λ 0 a Lagrange multiplier then we are interested in minimizing the combination C 1 + λp(d 1 = 1 ). sing (2.11) we have C 1 + λp(d 1 = 1 ) = δ 1 (X)D(ˆθ, X)q 1 (ˆθ X)dˆθdX + { [ δ 1 (X) inf D(, X)dX + λf(x 0) ] δ 0 (X)D(0, X)dX + λ δ 1 (X)f(X 0)dX (2.13) } + δ 0 (X)D(0, X) dx (2.14) { } min inf D(, X)dX + λf(x 0), D(0, X) dx. (2.15) We have equality in (2.14) whenever the estimator q 1 (ˆθ X) is the optimum Bayesian estimator and equality in (2.15) whenever our decision between the two main hypotheses is according to (2.12) Conditional Cost A slightly different and in some sense more general approach is the assume that under we have X f 0 (X) and under H 1 the data satisfy X f 1 (X θ) with the parameter vector having the prior π(θ). Here as before we assume that under the data pdf is completely known but it does not necessarily correspond to a specific selection of the parameter vector θ of f 1 (X θ). Regarding Bayesian costs we only define the cost function C(ˆθ, θ) expressing the cost of providing the estimate ˆθ when the true value is θ. Clearly this cost makes sense whenever the true hypothesis is H 1 and with our detection scheme we also decide in favor of H 1. Again our decision mechanism involves δ 0 (X), δ 1 (X), q 1 (ˆθ X) since there is no estimation under. Here however we are interested in computing the average cost under H 1 but conditioned on the event that we have selected correctly the main hypothesis, namely C = E[C(ˆθ, θ) H 1, D 1 = 1] = = C(ˆθ, θ)δ1 (X)q 1 (ˆθ X)f 1 (X θ)π(θ)dˆθdθdx δ1 (X)f 1 (X θ)π(θ)dθdx D(ˆθ, X)δ1 (X)q 1 (ˆθ X)dˆθdX δ1 (X)f 1 (X)dX (2.16)

18 14 2: Joint Hypothesis Testing and Estimation where D(, X) = C(, θ)f 1 (X θ)π(θ)dθ and f 1 (X) = f 1 (X θ)π(θ i )dθ is the mixture pdf. In other words we consider the average (estimation) cost conditioned on the event that we have correctly detected the main hypothesis. We can now attempt to minimize C and at the same time control the false alarm probability. This setup makes a lot of sense since the false alarm constraint assures the acceptable performance of the detection part while the conditional cost minimization provides the best possible estimator whenever we have correctly decided in favor of H 1. The next theorem solves exactly this problem. Theorem 5: Consider the class J α of detection/estimation procedures for which P(D 1 = 1 ) α with α [0, 1], then the optimum test that minimizes the cost C within the class J α is given by while the corresponding estimator is ρf 1 (X) inf D(, X) f 0 (X) H 1 λ (2.17) ˆθ = arg inf D(, X). (2.18) Parameter ρ, threshold λ 0 and randomization probability γ are selected so that the false alarm constraint is satisfied with equality, that is f 0 (X)dX + γ f 0 (X)dX = α, (2.19) A (ρ,λ) B(ρ,λ) but also we need the following equation [ ] ρf 1 (X) inf D(, X) dx + γ A (ρ,λ) B(ρ,λ) [ ] ρf 1 (X) inf D(, X) dx = 0. (2.20) A (ρ, λ) and B(ρ, λ) are the two subsets of R N for which the statistic in (2.17) exceeds and is equal to the threshold λ respectively. Proof: Assume for the moment existence of ρ, λ, γ that solve the system of equations (2.19) and (2.20) and call δ0(x), o δ1(x) o the randomization probabilities associated with the test in (2.17). Consider now any test in the class J α and let us perform the following manipulations δ 1 (X)D(ˆθ, X)q 1 (ˆθ X)dˆθdX ρ δ 1 (X)f 1 (X)dX + λα (2.21) δ 1 (X)D(ˆθ, X)q 1 (ˆθ X)dˆθdX ρ δ 1 (X)f 1 (X)dX + λ δ 1 (X)f 0 (X)dX (2.22) δ 1 (X) inf D(, X)dX ρ δ 1 (X)f 1 (X)dX + λ δ 1 (X)f 0 (X)dX (2.23) [ ] = δ 1 (X) inf D(, X)dX ρf 1(X) + λf 0 (X) dx (2.24) { } min inf D(, X)dX ρf 1(X) + λf 0 (X), 0 dx (2.25) [ ] = δ1(x) o inf D(, X)dX ρf 1(X) dx + λ δ1(x)f o 0 (X)dX (2.26) = λα. (2.27) ntil (2.25) the results are straightforward. Equ. (2.26) expresses the fact that the lower bound is attainable by the proposed detection/estimation scheme of (2.17), (2.18). Finally (2.27) is a consequence of ρ, λ, γ solving the two equations (2.19), (2.20). Comparing (2.21) with (2.27) we conclude that for any test in the class J α we have C ρ with equality whenever the test coincides with the one proposed by the theorem. In order for our proof to be complete we need to show that there exists combination of ρ, λ, γ that solve the two equations. For simplicity we are going to assume that the set B(ρ, λ) has zero probability with respect to f 0 (X) this allows us to select γ = 0.

19 2.3: Examples Examples In this section we present a number of interesting examples by selecting various forms for the cost functions. We basically concentrate on the most well known costs encountered in classical Bayesian estimation theory. We start with the MAP estimate which demonstrates optimality of GLRT MAP Detection/Estimation Consider the following combination of cost functions { C0(, 1 θ) = C1(, 0 θ) = 1; C0(, 0 θ) = C1(, 1 0 θ 1 θ) = 1 otherwise. (2.28) We recall from the classical Bayesian estimation theory (see [15, Page 145]) that, as 0 and assuming sufficient smoothness of the pdf functions, the specific selection of costs leads to the MAP parameter estimation under each main hypothesis. Indeed since 2 D i i (, X) 1 V i f i (X )π i () (2.29) where V i is the volume of a hypersphere of radius (which can be different for each hypothesis if the two parameter vectors are not of the same length). By substituting in (2.10) yields H sup f 1 (X )π 1 () 1 λ V 0 sup f 0 ( X)π 0 () V 1 = λ, (2.30) and the optimum estimator under each hypothesis is the MAP estimator Similarly for the special case of Subsection if we define ˆθ i = arg sup f i (X )π i (). (2.31) { 0 θ 1 C(, θ) = 1 otherwise, (2.32) then D(, X) 1 V f(x )π i () and the optimum test in (2.12) takes the form sup f(x )π() f(x 0) H 1 λ V = λ, (2.33) with the optimum estimator being ˆθ = arg sup f(x )π(). In both tests (2.30) and (2.33) the threshold λ (and the corresponding randomization probabilities) are selected to satisfy the false alarm constraint with equality. If the prior probabilities π i (θ i ), π(θ) are unknown and are replaced by the uniform we obtain the classical form of GLRT. If we now consider the discrete version of the problem and assume that θ i can take only upon a finite set of values V i = {θ i 1,..., θ iki } with corresponding prior probabilities π i1,..., π iki then we recover the GLRT MMSE Detection/Estimation Let us now develop the first test that can be used as an alternative to GLRT. Consider the following costs C 1 0(, θ) = C 1 0(θ); C 0 1(, θ) = C 0 1(θ); C 0 0(, θ) = C 1 1(, θ) = θ 2, (2.34) where C 1 0(θ), C 0 1(θ) are functions to be specified in the sequel. Due to the previous selection, the estimation part is independent from the detection. nder each main hypothesis the optimum estimator is selected by minimizing the 2 The approximate equality becomes exact as 0.

20 16 2: Joint Hypothesis Testing and Estimation corresponding mean square error. Consequently the optimum estimator is the conditional mean of the parameter vector given the data vector X (see [15, Page 143]). Specifically we have θi f i (X θ i )π i (θ i ) dθ i ˆθ i = E[θ i X, H i ] =, i = 0, 1. (2.35) fi (X θ i )π i (θ i ) dθ i The corresponding optimum test after substituting in (2.10) takes the form A 1 (X) H 1 λa 0 (X) (2.36) where A 0 (X) = ˆθ 0 2 f 0 (X) + [C1(θ 0 0 ) θ 0 2 ]f 0 (X θ 0 )π 0 (θ 0 ) dθ 0 A 1 (X) = ˆθ 1 2 f 1 (X) + [C0(θ 1 1 ) θ 1 2 ]f 1 (X θ 1 )π 1 (θ 1 ) dθ 1 f i (X) = f i (X θ i )π i (θ i ) dθ i. (2.37) Selecting C 1 0(θ 1 ) = θ 1 2 and C 0 1(θ 0 ) = θ 0 2 simplifies the test considerably yielding ˆθ 1 2 ˆθ 0 2 H f1 (X θ 1 )π 1 (θ 1 ) dθ 1 1 λ. (2.38) f0 (X θ 0 )π 1 (θ 0 ) dθ 0 We recognize in the second ratio the statistic that is used to decide optimally between the two main hypotheses. By including the first ratio of the two norm square estimates the test performs simultaneous optimum detection and estimation. For the special case of Subsection 2.2.1, it is easy to verify that the corresponding test takes the form ˆθ 2 f(x θ)π(θ) dθ f(x 0) where ˆθ = E[θ X, H 1 ] = θf(x θ)π(θ)dθ/ f(x θ)π(θ)dθ. H 1 λ, (2.39) In both tests in (2.36) and (2.39), if the priors are not known and are replaced by uniforms, we obtain tests that are the equivalent of GLRT for the MMSE criterion Median Detection/Estimation As our final example we present the case of the median estimation where θ i, ˆθ i, θ, are scalars and we select the cost functions as follows C 1 0(, θ) = C 1 0(θ); C 0 1(, θ) = C 0 1(θ); C 0 0(, θ) = C 1 1(, θ) = θ. (2.40) As in the previous examples the estimation part is independent from detection and under each hypothesis it is the optimum Bayes estimator. Consequently for this cost function the optimum estimator is the conditional median [15, Page 143] { y ˆθ i = arg y : P(θ i y X, H i ) = f } i(x θ i )π i (θ i ) dθ i = 1, i = 0, 1. (2.41) fi (X θ i )π i (θ i ) dθ i 2 The optimum test, as before, becomes A 1 (X) H 1 λa 0 (X) (2.42)

Optimum Joint Detection and Estimation

Optimum Joint Detection and Estimation 20 IEEE International Symposium on Information Theory Proceedings Optimum Joint Detection and Estimation George V. Moustakides Department of Electrical and Computer Engineering University of Patras, 26500

More information

Finite sample size optimality of GLR tests. George V. Moustakides University of Patras, Greece

Finite sample size optimality of GLR tests. George V. Moustakides University of Patras, Greece Finite sample size optimality of GLR tests George V. Moustakides University of Patras, Greece Outline Hypothesis testing and GLR Randomized tests Classical hypothesis testing with randomized tests Alternative

More information

Joint Detection and Estimation: Optimum Tests and Applications

Joint Detection and Estimation: Optimum Tests and Applications IEEE TRANSACTIONS ON INFORMATION THEORY (SUBMITTED) 1 Joint Detection and Estimation: Optimum Tests and Applications George V. Moustakides, Senior Member, IEEE, Guido H. Jajamovich, Student Member, IEEE,

More information

Change-point models and performance measures for sequential change detection

Change-point models and performance measures for sequential change detection Change-point models and performance measures for sequential change detection Department of Electrical and Computer Engineering, University of Patras, 26500 Rion, Greece moustaki@upatras.gr George V. Moustakides

More information

Decision theory. 1 We may also consider randomized decision rules, where δ maps observed data D to a probability distribution over

Decision theory. 1 We may also consider randomized decision rules, where δ maps observed data D to a probability distribution over Point estimation Suppose we are interested in the value of a parameter θ, for example the unknown bias of a coin. We have already seen how one may use the Bayesian method to reason about θ; namely, we

More information

ECE531 Lecture 4b: Composite Hypothesis Testing

ECE531 Lecture 4b: Composite Hypothesis Testing ECE531 Lecture 4b: Composite Hypothesis Testing D. Richard Brown III Worcester Polytechnic Institute 16-February-2011 Worcester Polytechnic Institute D. Richard Brown III 16-February-2011 1 / 44 Introduction

More information

Lecture 8: Information Theory and Statistics

Lecture 8: Information Theory and Statistics Lecture 8: Information Theory and Statistics Part II: Hypothesis Testing and I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 23, 2015 1 / 50 I-Hsiang

More information

Hypothesis Test. The opposite of the null hypothesis, called an alternative hypothesis, becomes

Hypothesis Test. The opposite of the null hypothesis, called an alternative hypothesis, becomes Neyman-Pearson paradigm. Suppose that a researcher is interested in whether the new drug works. The process of determining whether the outcome of the experiment points to yes or no is called hypothesis

More information

Hypothesis Testing. 1 Definitions of test statistics. CB: chapter 8; section 10.3

Hypothesis Testing. 1 Definitions of test statistics. CB: chapter 8; section 10.3 Hypothesis Testing CB: chapter 8; section 0.3 Hypothesis: statement about an unknown population parameter Examples: The average age of males in Sweden is 7. (statement about population mean) The lowest

More information

Hypothesis Testing. Testing Hypotheses MIT Dr. Kempthorne. Spring MIT Testing Hypotheses

Hypothesis Testing. Testing Hypotheses MIT Dr. Kempthorne. Spring MIT Testing Hypotheses Testing Hypotheses MIT 18.443 Dr. Kempthorne Spring 2015 1 Outline Hypothesis Testing 1 Hypothesis Testing 2 Hypothesis Testing: Statistical Decision Problem Two coins: Coin 0 and Coin 1 P(Head Coin 0)

More information

Introduction to Bayesian Statistics

Introduction to Bayesian Statistics Bayesian Parameter Estimation Introduction to Bayesian Statistics Harvey Thornburg Center for Computer Research in Music and Acoustics (CCRMA) Department of Music, Stanford University Stanford, California

More information

Lecture notes on statistical decision theory Econ 2110, fall 2013

Lecture notes on statistical decision theory Econ 2110, fall 2013 Lecture notes on statistical decision theory Econ 2110, fall 2013 Maximilian Kasy March 10, 2014 These lecture notes are roughly based on Robert, C. (2007). The Bayesian choice: from decision-theoretic

More information

10. Composite Hypothesis Testing. ECE 830, Spring 2014

10. Composite Hypothesis Testing. ECE 830, Spring 2014 10. Composite Hypothesis Testing ECE 830, Spring 2014 1 / 25 In many real world problems, it is difficult to precisely specify probability distributions. Our models for data may involve unknown parameters

More information

2. What are the tradeoffs among different measures of error (e.g. probability of false alarm, probability of miss, etc.)?

2. What are the tradeoffs among different measures of error (e.g. probability of false alarm, probability of miss, etc.)? ECE 830 / CS 76 Spring 06 Instructors: R. Willett & R. Nowak Lecture 3: Likelihood ratio tests, Neyman-Pearson detectors, ROC curves, and sufficient statistics Executive summary In the last lecture we

More information

Lecture 5: Likelihood ratio tests, Neyman-Pearson detectors, ROC curves, and sufficient statistics. 1 Executive summary

Lecture 5: Likelihood ratio tests, Neyman-Pearson detectors, ROC curves, and sufficient statistics. 1 Executive summary ECE 830 Spring 207 Instructor: R. Willett Lecture 5: Likelihood ratio tests, Neyman-Pearson detectors, ROC curves, and sufficient statistics Executive summary In the last lecture we saw that the likelihood

More information

Detection theory. H 0 : x[n] = w[n]

Detection theory. H 0 : x[n] = w[n] Detection Theory Detection theory A the last topic of the course, we will briefly consider detection theory. The methods are based on estimation theory and attempt to answer questions such as Is a signal

More information

Joint Detection and Estimation: Optimum Tests and Applications

Joint Detection and Estimation: Optimum Tests and Applications IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 58, NO. 7, JULY 2012 4215 Joint Detection and Estimation: Optimum Tests and Applications George V. Moustakides, Senior Member, IEEE, Guido H. Jajamovich, Student

More information

Chapter 4. Theory of Tests. 4.1 Introduction

Chapter 4. Theory of Tests. 4.1 Introduction Chapter 4 Theory of Tests 4.1 Introduction Parametric model: (X, B X, P θ ), P θ P = {P θ θ Θ} where Θ = H 0 +H 1 X = K +A : K: critical region = rejection region / A: acceptance region A decision rule

More information

Lecture 7 Introduction to Statistical Decision Theory

Lecture 7 Introduction to Statistical Decision Theory Lecture 7 Introduction to Statistical Decision Theory I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 20, 2016 1 / 55 I-Hsiang Wang IT Lecture 7

More information

Chapter 2. Binary and M-ary Hypothesis Testing 2.1 Introduction (Levy 2.1)

Chapter 2. Binary and M-ary Hypothesis Testing 2.1 Introduction (Levy 2.1) Chapter 2. Binary and M-ary Hypothesis Testing 2.1 Introduction (Levy 2.1) Detection problems can usually be casted as binary or M-ary hypothesis testing problems. Applications: This chapter: Simple hypothesis

More information

44 CHAPTER 2. BAYESIAN DECISION THEORY

44 CHAPTER 2. BAYESIAN DECISION THEORY 44 CHAPTER 2. BAYESIAN DECISION THEORY Problems Section 2.1 1. In the two-category case, under the Bayes decision rule the conditional error is given by Eq. 7. Even if the posterior densities are continuous,

More information

DS-GA 1002 Lecture notes 11 Fall Bayesian statistics

DS-GA 1002 Lecture notes 11 Fall Bayesian statistics DS-GA 100 Lecture notes 11 Fall 016 Bayesian statistics In the frequentist paradigm we model the data as realizations from a distribution that depends on deterministic parameters. In contrast, in Bayesian

More information

Lecture 8: Information Theory and Statistics

Lecture 8: Information Theory and Statistics Lecture 8: Information Theory and Statistics Part II: Hypothesis Testing and Estimation I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 22, 2015

More information

Detection theory 101 ELEC-E5410 Signal Processing for Communications

Detection theory 101 ELEC-E5410 Signal Processing for Communications Detection theory 101 ELEC-E5410 Signal Processing for Communications Binary hypothesis testing Null hypothesis H 0 : e.g. noise only Alternative hypothesis H 1 : signal + noise p(x;h 0 ) γ p(x;h 1 ) Trade-off

More information

Lagrange Relaxation and Duality

Lagrange Relaxation and Duality Lagrange Relaxation and Duality As we have already known, constrained optimization problems are harder to solve than unconstrained problems. By relaxation we can solve a more difficult problem by a simpler

More information

Ch. 5 Hypothesis Testing

Ch. 5 Hypothesis Testing Ch. 5 Hypothesis Testing The current framework of hypothesis testing is largely due to the work of Neyman and Pearson in the late 1920s, early 30s, complementing Fisher s work on estimation. As in estimation,

More information

Detection Theory. Composite tests

Detection Theory. Composite tests Composite tests Chapter 5: Correction Thu I claimed that the above, which is the most general case, was captured by the below Thu Chapter 5: Correction Thu I claimed that the above, which is the most general

More information

Statistical Inference

Statistical Inference Statistical Inference Robert L. Wolpert Institute of Statistics and Decision Sciences Duke University, Durham, NC, USA Spring, 2006 1. DeGroot 1973 In (DeGroot 1973), Morrie DeGroot considers testing the

More information

Let us first identify some classes of hypotheses. simple versus simple. H 0 : θ = θ 0 versus H 1 : θ = θ 1. (1) one-sided

Let us first identify some classes of hypotheses. simple versus simple. H 0 : θ = θ 0 versus H 1 : θ = θ 1. (1) one-sided Let us first identify some classes of hypotheses. simple versus simple H 0 : θ = θ 0 versus H 1 : θ = θ 1. (1) one-sided H 0 : θ θ 0 versus H 1 : θ > θ 0. (2) two-sided; null on extremes H 0 : θ θ 1 or

More information

ORF 245 Fundamentals of Statistics Chapter 9 Hypothesis Testing

ORF 245 Fundamentals of Statistics Chapter 9 Hypothesis Testing ORF 245 Fundamentals of Statistics Chapter 9 Hypothesis Testing Robert Vanderbei Fall 2014 Slides last edited on November 24, 2014 http://www.princeton.edu/ rvdb Coin Tossing Example Consider two coins.

More information

If there exists a threshold k 0 such that. then we can take k = k 0 γ =0 and achieve a test of size α. c 2004 by Mark R. Bell,

If there exists a threshold k 0 such that. then we can take k = k 0 γ =0 and achieve a test of size α. c 2004 by Mark R. Bell, Recall The Neyman-Pearson Lemma Neyman-Pearson Lemma: Let Θ = {θ 0, θ }, and let F θ0 (x) be the cdf of the random vector X under hypothesis and F θ (x) be its cdf under hypothesis. Assume that the cdfs

More information

STAT 830 Hypothesis Testing

STAT 830 Hypothesis Testing STAT 830 Hypothesis Testing Richard Lockhart Simon Fraser University STAT 830 Fall 2018 Richard Lockhart (Simon Fraser University) STAT 830 Hypothesis Testing STAT 830 Fall 2018 1 / 30 Purposes of These

More information

Composite Hypotheses and Generalized Likelihood Ratio Tests

Composite Hypotheses and Generalized Likelihood Ratio Tests Composite Hypotheses and Generalized Likelihood Ratio Tests Rebecca Willett, 06 In many real world problems, it is difficult to precisely specify probability distributions. Our models for data may involve

More information

STAT 830 Hypothesis Testing

STAT 830 Hypothesis Testing STAT 830 Hypothesis Testing Hypothesis testing is a statistical problem where you must choose, on the basis of data X, between two alternatives. We formalize this as the problem of choosing between two

More information

Part III. A Decision-Theoretic Approach and Bayesian testing

Part III. A Decision-Theoretic Approach and Bayesian testing Part III A Decision-Theoretic Approach and Bayesian testing 1 Chapter 10 Bayesian Inference as a Decision Problem The decision-theoretic framework starts with the following situation. We would like to

More information

Two results in statistical decision theory for detecting signals with unknown distributions and priors in white Gaussian noise.

Two results in statistical decision theory for detecting signals with unknown distributions and priors in white Gaussian noise. Two results in statistical decision theory for detecting signals with unknown distributions and priors in white Gaussian noise. Dominique Pastor GET - ENST Bretagne, CNRS UMR 2872 TAMCIC, Technopôle de

More information

Detection and Estimation Chapter 1. Hypothesis Testing

Detection and Estimation Chapter 1. Hypothesis Testing Detection and Estimation Chapter 1. Hypothesis Testing Husheng Li Min Kao Department of Electrical Engineering and Computer Science University of Tennessee, Knoxville Spring, 2015 1/20 Syllabus Homework:

More information

LECTURE 5 NOTES. n t. t Γ(a)Γ(b) pt+a 1 (1 p) n t+b 1. The marginal density of t is. Γ(t + a)γ(n t + b) Γ(n + a + b)

LECTURE 5 NOTES. n t. t Γ(a)Γ(b) pt+a 1 (1 p) n t+b 1. The marginal density of t is. Γ(t + a)γ(n t + b) Γ(n + a + b) LECTURE 5 NOTES 1. Bayesian point estimators. In the conventional (frequentist) approach to statistical inference, the parameter θ Θ is considered a fixed quantity. In the Bayesian approach, it is considered

More information

A Very Brief Summary of Statistical Inference, and Examples

A Very Brief Summary of Statistical Inference, and Examples A Very Brief Summary of Statistical Inference, and Examples Trinity Term 2008 Prof. Gesine Reinert 1 Data x = x 1, x 2,..., x n, realisations of random variables X 1, X 2,..., X n with distribution (model)

More information

Statistics 612: L p spaces, metrics on spaces of probabilites, and connections to estimation

Statistics 612: L p spaces, metrics on spaces of probabilites, and connections to estimation Statistics 62: L p spaces, metrics on spaces of probabilites, and connections to estimation Moulinath Banerjee December 6, 2006 L p spaces and Hilbert spaces We first formally define L p spaces. Consider

More information

ECE531 Lecture 8: Non-Random Parameter Estimation

ECE531 Lecture 8: Non-Random Parameter Estimation ECE531 Lecture 8: Non-Random Parameter Estimation D. Richard Brown III Worcester Polytechnic Institute 19-March-2009 Worcester Polytechnic Institute D. Richard Brown III 19-March-2009 1 / 25 Introduction

More information

Decentralized Detection In Wireless Sensor Networks

Decentralized Detection In Wireless Sensor Networks Decentralized Detection In Wireless Sensor Networks Milad Kharratzadeh Department of Electrical & Computer Engineering McGill University Montreal, Canada April 2011 Statistical Detection and Estimation

More information

Change Detection Algorithms

Change Detection Algorithms 5 Change Detection Algorithms In this chapter, we describe the simplest change detection algorithms. We consider a sequence of independent random variables (y k ) k with a probability density p (y) depending

More information

Constrained Optimization and Lagrangian Duality

Constrained Optimization and Lagrangian Duality CIS 520: Machine Learning Oct 02, 2017 Constrained Optimization and Lagrangian Duality Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture. They may or may

More information

The University of Hong Kong Department of Statistics and Actuarial Science STAT2802 Statistical Models Tutorial Solutions Solutions to Problems 71-80

The University of Hong Kong Department of Statistics and Actuarial Science STAT2802 Statistical Models Tutorial Solutions Solutions to Problems 71-80 The University of Hong Kong Department of Statistics and Actuarial Science STAT2802 Statistical Models Tutorial Solutions Solutions to Problems 71-80 71. Decide in each case whether the hypothesis is simple

More information

LECTURE 10: NEYMAN-PEARSON LEMMA AND ASYMPTOTIC TESTING. The last equality is provided so this can look like a more familiar parametric test.

LECTURE 10: NEYMAN-PEARSON LEMMA AND ASYMPTOTIC TESTING. The last equality is provided so this can look like a more familiar parametric test. Economics 52 Econometrics Professor N.M. Kiefer LECTURE 1: NEYMAN-PEARSON LEMMA AND ASYMPTOTIC TESTING NEYMAN-PEARSON LEMMA: Lesson: Good tests are based on the likelihood ratio. The proof is easy in the

More information

Extreme Abridgment of Boyd and Vandenberghe s Convex Optimization

Extreme Abridgment of Boyd and Vandenberghe s Convex Optimization Extreme Abridgment of Boyd and Vandenberghe s Convex Optimization Compiled by David Rosenberg Abstract Boyd and Vandenberghe s Convex Optimization book is very well-written and a pleasure to read. The

More information

Hypothesis Testing. Econ 690. Purdue University. Justin L. Tobias (Purdue) Testing 1 / 33

Hypothesis Testing. Econ 690. Purdue University. Justin L. Tobias (Purdue) Testing 1 / 33 Hypothesis Testing Econ 690 Purdue University Justin L. Tobias (Purdue) Testing 1 / 33 Outline 1 Basic Testing Framework 2 Testing with HPD intervals 3 Example 4 Savage Dickey Density Ratio 5 Bartlett

More information

Structural and Multidisciplinary Optimization. P. Duysinx and P. Tossings

Structural and Multidisciplinary Optimization. P. Duysinx and P. Tossings Structural and Multidisciplinary Optimization P. Duysinx and P. Tossings 2018-2019 CONTACTS Pierre Duysinx Institut de Mécanique et du Génie Civil (B52/3) Phone number: 04/366.91.94 Email: P.Duysinx@uliege.be

More information

Definition 3.1 A statistical hypothesis is a statement about the unknown values of the parameters of the population distribution.

Definition 3.1 A statistical hypothesis is a statement about the unknown values of the parameters of the population distribution. Hypothesis Testing Definition 3.1 A statistical hypothesis is a statement about the unknown values of the parameters of the population distribution. Suppose the family of population distributions is indexed

More information

Testing Simple Hypotheses R.L. Wolpert Institute of Statistics and Decision Sciences Duke University, Box Durham, NC 27708, USA

Testing Simple Hypotheses R.L. Wolpert Institute of Statistics and Decision Sciences Duke University, Box Durham, NC 27708, USA Testing Simple Hypotheses R.L. Wolpert Institute of Statistics and Decision Sciences Duke University, Box 90251 Durham, NC 27708, USA Summary: Pre-experimental Frequentist error probabilities do not summarize

More information

Statistical Inference

Statistical Inference Statistical Inference Robert L. Wolpert Institute of Statistics and Decision Sciences Duke University, Durham, NC, USA Week 12. Testing and Kullback-Leibler Divergence 1. Likelihood Ratios Let 1, 2, 2,...

More information

Lecture 22: Error exponents in hypothesis testing, GLRT

Lecture 22: Error exponents in hypothesis testing, GLRT 10-704: Information Processing and Learning Spring 2012 Lecture 22: Error exponents in hypothesis testing, GLRT Lecturer: Aarti Singh Scribe: Aarti Singh Disclaimer: These notes have not been subjected

More information

BAYESIAN DECISION THEORY

BAYESIAN DECISION THEORY Last updated: September 17, 2012 BAYESIAN DECISION THEORY Problems 2 The following problems from the textbook are relevant: 2.1 2.9, 2.11, 2.17 For this week, please at least solve Problem 2.3. We will

More information

Module 1. Probability

Module 1. Probability Module 1 Probability 1. Introduction In our daily life we come across many processes whose nature cannot be predicted in advance. Such processes are referred to as random processes. The only way to derive

More information

FORMULATION OF THE LEARNING PROBLEM

FORMULATION OF THE LEARNING PROBLEM FORMULTION OF THE LERNING PROBLEM MIM RGINSKY Now that we have seen an informal statement of the learning problem, as well as acquired some technical tools in the form of concentration inequalities, we

More information

ECE531 Lecture 13: Sequential Detection of Discrete-Time Signals

ECE531 Lecture 13: Sequential Detection of Discrete-Time Signals ECE531 Lecture 13: Sequential Detection of Discrete-Time Signals D. Richard Brown III Worcester Polytechnic Institute 30-Apr-2009 Worcester Polytechnic Institute D. Richard Brown III 30-Apr-2009 1 / 32

More information

Hypothesis Testing - Frequentist

Hypothesis Testing - Frequentist Frequentist Hypothesis Testing - Frequentist Compare two hypotheses to see which one better explains the data. Or, alternatively, what is the best way to separate events into two classes, those originating

More information

Lecture 7 October 13

Lecture 7 October 13 STATS 300A: Theory of Statistics Fall 2015 Lecture 7 October 13 Lecturer: Lester Mackey Scribe: Jing Miao and Xiuyuan Lu 7.1 Recap So far, we have investigated various criteria for optimal inference. We

More information

ST 740: Model Selection

ST 740: Model Selection ST 740: Model Selection Alyson Wilson Department of Statistics North Carolina State University November 25, 2013 A. Wilson (NCSU Statistics) Model Selection November 25, 2013 1 / 29 Formal Bayesian Model

More information

ECE531 Lecture 6: Detection of Discrete-Time Signals with Random Parameters

ECE531 Lecture 6: Detection of Discrete-Time Signals with Random Parameters ECE531 Lecture 6: Detection of Discrete-Time Signals with Random Parameters D. Richard Brown III Worcester Polytechnic Institute 26-February-2009 Worcester Polytechnic Institute D. Richard Brown III 26-February-2009

More information

Sequential Procedure for Testing Hypothesis about Mean of Latent Gaussian Process

Sequential Procedure for Testing Hypothesis about Mean of Latent Gaussian Process Applied Mathematical Sciences, Vol. 4, 2010, no. 62, 3083-3093 Sequential Procedure for Testing Hypothesis about Mean of Latent Gaussian Process Julia Bondarenko Helmut-Schmidt University Hamburg University

More information

Sequential Detection. Changes: an overview. George V. Moustakides

Sequential Detection. Changes: an overview. George V. Moustakides Sequential Detection of Changes: an overview George V. Moustakides Outline Sequential hypothesis testing and Sequential detection of changes The Sequential Probability Ratio Test (SPRT) for optimum hypothesis

More information

Lecture 2: Statistical Decision Theory (Part I)

Lecture 2: Statistical Decision Theory (Part I) Lecture 2: Statistical Decision Theory (Part I) Hao Helen Zhang Hao Helen Zhang Lecture 2: Statistical Decision Theory (Part I) 1 / 35 Outline of This Note Part I: Statistics Decision Theory (from Statistical

More information

STOCHASTIC PROCESSES, DETECTION AND ESTIMATION Course Notes

STOCHASTIC PROCESSES, DETECTION AND ESTIMATION Course Notes STOCHASTIC PROCESSES, DETECTION AND ESTIMATION 6.432 Course Notes Alan S. Willsky, Gregory W. Wornell, and Jeffrey H. Shapiro Department of Electrical Engineering and Computer Science Massachusetts Institute

More information

DETECTION theory deals primarily with techniques for

DETECTION theory deals primarily with techniques for ADVANCED SIGNAL PROCESSING SE Optimum Detection of Deterministic and Random Signals Stefan Tertinek Graz University of Technology turtle@sbox.tugraz.at Abstract This paper introduces various methods for

More information

STAT 801: Mathematical Statistics. Hypothesis Testing

STAT 801: Mathematical Statistics. Hypothesis Testing STAT 801: Mathematical Statistics Hypothesis Testing Hypothesis testing: a statistical problem where you must choose, on the basis o data X, between two alternatives. We ormalize this as the problem o

More information

Gaussian Estimation under Attack Uncertainty

Gaussian Estimation under Attack Uncertainty Gaussian Estimation under Attack Uncertainty Tara Javidi Yonatan Kaspi Himanshu Tyagi Abstract We consider the estimation of a standard Gaussian random variable under an observation attack where an adversary

More information

STAT 830 Decision Theory and Bayesian Methods

STAT 830 Decision Theory and Bayesian Methods STAT 830 Decision Theory and Bayesian Methods Example: Decide between 4 modes of transportation to work: B = Ride my bike. C = Take the car. T = Use public transit. H = Stay home. Costs depend on weather:

More information

STA 732: Inference. Notes 2. Neyman-Pearsonian Classical Hypothesis Testing B&D 4

STA 732: Inference. Notes 2. Neyman-Pearsonian Classical Hypothesis Testing B&D 4 STA 73: Inference Notes. Neyman-Pearsonian Classical Hypothesis Testing B&D 4 1 Testing as a rule Fisher s quantification of extremeness of observed evidence clearly lacked rigorous mathematical interpretation.

More information

Bayesian Decision Theory

Bayesian Decision Theory Bayesian Decision Theory Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Fall 2017 CS 551, Fall 2017 c 2017, Selim Aksoy (Bilkent University) 1 / 46 Bayesian

More information

F2E5216/TS1002 Adaptive Filtering and Change Detection. Course Organization. Lecture plan. The Books. Lecture 1

F2E5216/TS1002 Adaptive Filtering and Change Detection. Course Organization. Lecture plan. The Books. Lecture 1 Adaptive Filtering and Change Detection Bo Wahlberg (KTH and Fredrik Gustafsson (LiTH Course Organization Lectures and compendium: Theory, Algorithms, Applications, Evaluation Toolbox and manual: Algorithms,

More information

Probability. Lecture Notes. Adolfo J. Rumbos

Probability. Lecture Notes. Adolfo J. Rumbos Probability Lecture Notes Adolfo J. Rumbos October 20, 204 2 Contents Introduction 5. An example from statistical inference................ 5 2 Probability Spaces 9 2. Sample Spaces and σ fields.....................

More information

Lecture 2: Basic Concepts of Statistical Decision Theory

Lecture 2: Basic Concepts of Statistical Decision Theory EE378A Statistical Signal Processing Lecture 2-03/31/2016 Lecture 2: Basic Concepts of Statistical Decision Theory Lecturer: Jiantao Jiao, Tsachy Weissman Scribe: John Miller and Aran Nayebi In this lecture

More information

Parameter estimation and forecasting. Cristiano Porciani AIfA, Uni-Bonn

Parameter estimation and forecasting. Cristiano Porciani AIfA, Uni-Bonn Parameter estimation and forecasting Cristiano Porciani AIfA, Uni-Bonn Questions? C. Porciani Estimation & forecasting 2 Temperature fluctuations Variance at multipole l (angle ~180o/l) C. Porciani Estimation

More information

Lecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable

Lecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable Lecture Notes 1 Probability and Random Variables Probability Spaces Conditional Probability and Independence Random Variables Functions of a Random Variable Generation of a Random Variable Jointly Distributed

More information

Lecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable

Lecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable Lecture Notes 1 Probability and Random Variables Probability Spaces Conditional Probability and Independence Random Variables Functions of a Random Variable Generation of a Random Variable Jointly Distributed

More information

Metric Spaces and Topology

Metric Spaces and Topology Chapter 2 Metric Spaces and Topology From an engineering perspective, the most important way to construct a topology on a set is to define the topology in terms of a metric on the set. This approach underlies

More information

Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project

Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project Devin Cornell & Sushruth Sastry May 2015 1 Abstract In this article, we explore

More information

MODULE -4 BAYEIAN LEARNING

MODULE -4 BAYEIAN LEARNING MODULE -4 BAYEIAN LEARNING CONTENT Introduction Bayes theorem Bayes theorem and concept learning Maximum likelihood and Least Squared Error Hypothesis Maximum likelihood Hypotheses for predicting probabilities

More information

Interval Estimation. Chapter 9

Interval Estimation. Chapter 9 Chapter 9 Interval Estimation 9.1 Introduction Definition 9.1.1 An interval estimate of a real-values parameter θ is any pair of functions, L(x 1,..., x n ) and U(x 1,..., x n ), of a sample that satisfy

More information

Fundamentals. CS 281A: Statistical Learning Theory. Yangqing Jia. August, Based on tutorial slides by Lester Mackey and Ariel Kleiner

Fundamentals. CS 281A: Statistical Learning Theory. Yangqing Jia. August, Based on tutorial slides by Lester Mackey and Ariel Kleiner Fundamentals CS 281A: Statistical Learning Theory Yangqing Jia Based on tutorial slides by Lester Mackey and Ariel Kleiner August, 2011 Outline 1 Probability 2 Statistics 3 Linear Algebra 4 Optimization

More information

10-704: Information Processing and Learning Fall Lecture 24: Dec 7

10-704: Information Processing and Learning Fall Lecture 24: Dec 7 0-704: Information Processing and Learning Fall 206 Lecturer: Aarti Singh Lecture 24: Dec 7 Note: These notes are based on scribed notes from Spring5 offering of this course. LaTeX template courtesy of

More information

Distributed Detection of Binary Decisions with Collisions in a Large, Random Network

Distributed Detection of Binary Decisions with Collisions in a Large, Random Network Distributed Detection of Binary Decisions with Collisions in a Large, Random Network Gene T Whipps, Emre Ertin, and Randolph L Moses US Army Research Laboratory, Adelphi, MD 2783 Department of Electrical

More information

INTRODUCTION TO BAYESIAN METHODS II

INTRODUCTION TO BAYESIAN METHODS II INTRODUCTION TO BAYESIAN METHODS II Abstract. We will revisit point estimation and hypothesis testing from the Bayesian perspective.. Bayes estimators Let X = (X,..., X n ) be a random sample from the

More information

Set, functions and Euclidean space. Seungjin Han

Set, functions and Euclidean space. Seungjin Han Set, functions and Euclidean space Seungjin Han September, 2018 1 Some Basics LOGIC A is necessary for B : If B holds, then A holds. B A A B is the contraposition of B A. A is sufficient for B: If A holds,

More information

Algorithmisches Lernen/Machine Learning

Algorithmisches Lernen/Machine Learning Algorithmisches Lernen/Machine Learning Part 1: Stefan Wermter Introduction Connectionist Learning (e.g. Neural Networks) Decision-Trees, Genetic Algorithms Part 2: Norman Hendrich Support-Vector Machines

More information

Decision Criteria 23

Decision Criteria 23 Decision Criteria 23 test will work. In Section 2.7 we develop bounds and approximate expressions for the performance that will be necessary for some of the later chapters. Finally, in Section 2.8 we summarize

More information

Continuum Probability and Sets of Measure Zero

Continuum Probability and Sets of Measure Zero Chapter 3 Continuum Probability and Sets of Measure Zero In this chapter, we provide a motivation for using measure theory as a foundation for probability. It uses the example of random coin tossing to

More information

On the Bayesianity of Pereira-Stern tests

On the Bayesianity of Pereira-Stern tests Sociedad de Estadística e Investigación Operativa Test (2001) Vol. 10, No. 2, pp. 000 000 On the Bayesianity of Pereira-Stern tests M. Regina Madruga Departamento de Estatística, Universidade Federal do

More information

Estimating Gaussian Mixture Densities with EM A Tutorial

Estimating Gaussian Mixture Densities with EM A Tutorial Estimating Gaussian Mixture Densities with EM A Tutorial Carlo Tomasi Due University Expectation Maximization (EM) [4, 3, 6] is a numerical algorithm for the maximization of functions of several variables

More information

Econ 2148, spring 2019 Statistical decision theory

Econ 2148, spring 2019 Statistical decision theory Econ 2148, spring 2019 Statistical decision theory Maximilian Kasy Department of Economics, Harvard University 1 / 53 Takeaways for this part of class 1. A general framework to think about what makes a

More information

P Values and Nuisance Parameters

P Values and Nuisance Parameters P Values and Nuisance Parameters Luc Demortier The Rockefeller University PHYSTAT-LHC Workshop on Statistical Issues for LHC Physics CERN, Geneva, June 27 29, 2007 Definition and interpretation of p values;

More information

A Very Brief Summary of Bayesian Inference, and Examples

A Very Brief Summary of Bayesian Inference, and Examples A Very Brief Summary of Bayesian Inference, and Examples Trinity Term 009 Prof Gesine Reinert Our starting point are data x = x 1, x,, x n, which we view as realisations of random variables X 1, X,, X

More information

A General Overview of Parametric Estimation and Inference Techniques.

A General Overview of Parametric Estimation and Inference Techniques. A General Overview of Parametric Estimation and Inference Techniques. Moulinath Banerjee University of Michigan September 11, 2012 The object of statistical inference is to glean information about an underlying

More information

Parameter Estimation

Parameter Estimation Parameter Estimation Chapters 13-15 Stat 477 - Loss Models Chapters 13-15 (Stat 477) Parameter Estimation Brian Hartman - BYU 1 / 23 Methods for parameter estimation Methods for parameter estimation Methods

More information

ECE 275A Homework 7 Solutions

ECE 275A Homework 7 Solutions ECE 275A Homework 7 Solutions Solutions 1. For the same specification as in Homework Problem 6.11 we want to determine an estimator for θ using the Method of Moments (MOM). In general, the MOM estimator

More information

HST.582J / 6.555J / J Biomedical Signal and Image Processing Spring 2007

HST.582J / 6.555J / J Biomedical Signal and Image Processing Spring 2007 MIT OpenCourseWare http://ocw.mit.edu HST.582J / 6.555J / 16.456J Biomedical Signal and Image Processing Spring 2007 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.

More information

Algorithm-Independent Learning Issues

Algorithm-Independent Learning Issues Algorithm-Independent Learning Issues Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Spring 2007 c 2007, Selim Aksoy Introduction We have seen many learning

More information

ECE531 Lecture 10b: Maximum Likelihood Estimation

ECE531 Lecture 10b: Maximum Likelihood Estimation ECE531 Lecture 10b: Maximum Likelihood Estimation D. Richard Brown III Worcester Polytechnic Institute 05-Apr-2011 Worcester Polytechnic Institute D. Richard Brown III 05-Apr-2011 1 / 23 Introduction So

More information