Chapter 2. Binary and M-ary Hypothesis Testing 2.1 Introduction (Levy 2.1)

Chapter 2. Binary and M-ary Hypothesis Testing 2.1 Introduction (Levy 2.1) Detection problems can usually be casted as binary or M-ary hypothesis testing problems. Applications: This chapter: Simple hypothesis testing problem, probability distribution of the observations under each hypothesis is assumed to be known exactly. Example: Composite hypothesis testing: problems involving unknown parameters (Chapter 4). Example: 1

Objectives: 1. Design testing rules that are optimal in some appropriate sense based on the amount of information available. 2. Analyze the performance of the test. Structure of this chapter: 2.2 Binary Hypothesis Testing Problem Formulation tes Problem modeling, notation, performance measure, etc. 2.3 Bayesian Test tes * A-priori prob. known * Cost structure known 2.4 Minimax Test tes * A-priori prob. unknown * Cost structure known 2.5 Neyman-Pearson Test tes * A-priori prob. unknown * Cost structure unknown tes Receiver operating characteristic (ROC) 2.6 Gaussian Detection 2.7 M-ary Hypothesis Testing 2

2.2 Binary Hypothesis Testing Problem Formulation (Levy 2.2 and 2.4) Binary Hypothesis Testing is to decide between 2 hypotheses based on the observation (random). Model contains: 1. Hypothesis and a-priori probability 2. Observation 3. Connection b/w hypotheses and observation 4. Decision function 5. Performance measure 3

1. Hypothesis and a-priori probability Hypotheses: and. π 0 = P ( ) and π 1 = P ( ) = 1 π 0. 2. Observation Random vector Y with sample space Y. An observation is a sample vector y of Y. 3. Connection b/w hypotheses and observation: Distributions of Y under and. For continuous Y, For discrete Y, PDF under each hypothesis : Y f Y (y ) : Y f Y (y ) Assume to be known in this chapter. PMF under each hypothesis : P (Y = y ) = p(y ) : P (Y = y ) = p(y ) 4

4. Decision function: Decide whether or is true given an observation. A map from Y to {0, 1}: δ(y) = Decision regions: Y 0 and Y 1 1 if decide on 0 if decide on Y 0 {y δ(y) = 0} Y 1 {y δ(y) = 1}. We have Y 0 Y 1 = and Y 0 Y 1 = Y. A decision function is a partition of the sample space of Y. Examples on decision rules: 5

Goal: obtain the decision rule which is optimal (in some sense). 5. An optimality/performance measure. Bayesian formulation: All uncertainties are quantifiable. The cost and benefits of outcomes can be measured. Cost function: C ij for i = 0, 1, j = 0, 1, the cost of deciding on H i when H j holds. The value of C ij depends on the application/nature of the problem. Examples on cost function: Assumption: Making a correct decision is always less costly than making a mistake, i.e., C 00 < C 10 and C 11 < C 01. 6

Bayes risk of a decision function δ:. test Risk under : R(δ ) = C 00 P (δ(y) = 0 ) + C 10 P (δ(y) = 1 ) Risk under : = C 00 P (Y 0 ) + C 10 P (Y 1 ) R(δ ) = C 01 P (δ(y) = 0 ) + C 11 P (δ(y) = 1 ) = C 01 P (Y 0 ) + C 11 P (Y 1 ) Continuous Y: P (Y i H j ) = Y i f(y H j )dy. Discrete Y: P (Y i H j ) = y Y i P (y H j ). 7

Bayes risk: R(δ) = R(δ )P ( ) + R(δ )P ( ) = π 0 C 00 P (Y 0 ) + π 0 C 10 P (Y 1 ) +π 1 C 01 P (Y 0 ) + π 1 C 11 P (Y 1 ) 1 1 = π j C ij P (Y i H j ). i=0 j=0 Since P (Y 0 ) + P (Y 1 ) = P (Y 0 ) + P (Y 1 ) = 1. R(δ) = π 0 C 00 + π 0 (C 10 C 00 )P (Y 1 ) +π 1 C 01 + π 1 (C 11 C 01 )P (Y 1 ). Optimal δ: δ that minimizes R(δ), the Bayes risk. 8

False alarm: is true but is decided. (Error of Type I) Detection: is true and is decided. Miss of detection: is true but is decided. (Error of Type II). Probability of detection: P D (δ) = P (Y 1 ). Probability of false alarm: P F (δ) = P (Y 1 ). Probability of miss: P M (δ) = P (Y 0 ) = 1 P D (δ). R(δ) = π 0 C 00 +π 0 (C 10 C 00 )P F (δ)+π 1 C 01 +π 1 (C 11 C 01 )P D (δ). Ideally: Want P D (δ) 1 and P F (δ) 0. Receiver operating characteristic (ROC): the upper boundary between achievable and un-achievable regions in the (P F, P D )-square. 9

2.3 Bayesian Testing (Levy 2.2) Assume: 1. A-priori probabilities (π 0, π 1 ) are known. 2. Cost structure (C ij ) is known. Find the optimal decision δ that minimizes Bayes risk, R(δ). Note: Distribution of Y under each hypothesis is also known. 10

2.3.1 Likelihood-Ratio Test (LRT) From previous derivation, R(δ) = π 0 C 00 + π 0 (C 10 C 00 )P (Y 1 ) + π 1 C 01 + π 1 (C 11 C 01 )P (Y 1 ) = π 0 C 00 + π 1 C 01 + Y 1 [π 0 (C 10 C 00 )f(y ) + π 1 (C 11 C 01 )f(y )] dy. R(δ) is minimized if and only if π 0 (C 10 C 00 )f(y ) + π 1 (C 11 C 01 )f(y ) π 0 (C 10 C 00 )f(y ) f(y ) f(y ) π 1 (C 01 C 11 )f(y ) (C 10 C 00)π 0 (C 01 C 11 )π 1, since C 01 > C 11. 0 11

Define the likelihood ratio as L(y) f(y ) f(y ) (C 10 C 00)π 0 (C 01 C 11 )π 1. The optimal decision rule is: L(y) τ (C 10 C 00)π 0 (C 01 C 11 )π 1 δ B (y) = 1 if L(y) τ 0 if L(y) < τ. For discrete Y, similarly, the optimal decision rule is: L(y) P (y ) P (y ) τ, a LRT. 12

Maximum A-Posteriori (MAP) Rule: Consider the following cost structure: C 00 = C 11 = 0, C 01 = C 10 = 1 C ij = 1 δ ij. 1 if i = j Kronecher delta function δ ij = 0 if i j. An error incurs a cost. Minimizing Bayes risk becomes minimizing the probability of error. LRT becomes: L(y) = P (y ) P (y ) π 1 f(y ) P ( Y = y) (C 10 C 00)π 0 = π 0 (C 01 C 11 )π 1 π 1 π 0 f(y ) P ( Y = y). Choose the hypothesis with the larger a-posteriori probability. 13

Maximum Likelihood Rule: If, furthermore, π 0 = π 1 = 1/2, equal-probable hypotheses, LRT becomes f(y ) f(y ). Choose the hypothesis with the larger likelihood function value. 2.3.2 Examples 14

2.3.3 Asymptotic Performance of LRT (Levy 3.2) For a binary hypothesis testing problem, Y 1, Y 2,, Y N is the sequence of i.i.d. random observations. Y k R n. Assume that Y is continuous and let LRT: 1 N Y = L(y) = f(y ) f(y ) = N N ln(y k ) k=1 1 N k=1 Y 1 Y 2 Y N. f(y k ) N f(y k ) = ln τ(n) γ(n). k=1 L(y k ) τ(n) 15

Let Z k ln L(Y k ) = f(y k ) f(y k ) and S N 1 N N k=1 Z k. The LRT becomes: S N γ(n). Notice that Z k s are i.i.d. and S N is the sample mean of Z k s. When N, strong law of large numbers a.s. : S N E [Z k ] = ln f(y ) f(y ) f(y )dy a.s. : S N E [Z k ] = ln f(y ) f(y ) f(y )dy Def. For two PDFs f and g, the Kullback-Leibler (KL) divergence is D(f g) = f(x) ln f(x) g(x) dx. A natural notion of distance between random variables. Not a true distance metric. 16

Properties: 1. D(f g) 0 with equality if and only if f = g. 2. Non-symmetric D(f g) D(g f). 3. Does not satisfy the triangular inequality. Let f 0 (y) f(y ) and f 1 (y) f(y ). When N, a.s. : S N ln f 1(y) f 0 (y) f 1(y)dy = D(f 1 f 0 ) > 0. a.s. : S N ln f 1(y) f 0 (y) f 0(y)dy = D(f 0 f 1 ) < 0 Thus P D (N) 1 and P F (N) 0. * As long as we are willing to collect an arbitrarily large number of ind. observations, we can separate perfectly and regardless of π 0 and C ij. How fast does P D (N) 1 and P F (N) 0? Exponentially with N. 17

2.4 Mini-max Hypothesis Testing (Levy 2.5) Assume: 1. A-priori probabilities (π 0, π 1 ) is unknown. 2. Cost structure (C ij ) is known. Possible solutions: 1. Guess. May lead to bad performance. 2. Design the test conservatively by assuming the least-favorable choice of a-priori and selecting the test that minimizes the Bayes risk for this choice. Minimizes the maximum risk. Guarantees a minimum level of performance independent of the a-priori. Problem statement: Find the test δ M and a-priori value π 0M that solves the mini-max problem (δ M, π 0M ) = arg min δ max R(δ, π 0). π 0 [0,1] 18

Approach: Saddle point method. Def. A saddle point is a point in the domain of a function which is a stationary point but not a local extremum. If a point (δ M, π 0M ) satisfies R(δ M, π 0 ) R(δ M, π 0M ) R(δ, π 0M ), for any δ, π 0, (1) It is a saddle point of the function R. It is the solution of the mini-max problem. Proof: Step 1: A saddle point of the form (1) exists. Step 2: The saddle point is the solution (Saddle point property) Step 3: Construct the saddle point. Mini-max equation: (C 01 C 11 ) + (C 11 C 01 )P D (δ M ) (C 10 C 00 )P F (δ M ) = 0. 19

Comments: test 1. If C 00 = C 11, the mini-max equation becomes P D = 1 C 10 C 00 C 01 C 11 P F, a line through (0,1) of the (P F, P D ) square. If C ij = 1 δ ij, mini-max equation becomes P D = 1 P F. 2. Mini-max test corresponding to the intersection of the ROC and the line of the mini-max equation. 3. The LRT threshold τ m of the mini-max test, corresponding to the intersection, equals the slope of the ROC at this point. The corresponding a-priori probability can be calculated by π 0M = [ 1 + C 10 C 00 (C 01 C 11 )τ M ] 1. 4. Another way of finding π 0M : π 0M = arg max π0 min δ R(δ, π 0 ). Define V (π 0 ) min δ R(δ, π 0 ), which is the minimum Bayes risk with a-priori π, achieved by the LRT. π 0M = arg max π 0 V (π 0 ). 20

Examples: 21

2.5 Neyman-Pearson (NP) Testing (Levy 2.4.1) Assume: 1. A-priori probabilities (π 0, π 1 ) is unknown. 2. Cost structure (C ij ) is unknown. NP-testing problem: Select the test δ that maximizes P D (δ) ensuring that the probability of false alarm P F (δ) is no more than α. D α {δ P F (δ) α} δ NP = arg max δ D α P D (α) Lagrangian method for constrained optimization. 22

δ NP = arg max P D (δ) subject to P F (δ) α. δ Consider the Lagrangian: L(δ, λ) P D (δ) + λ(p F (δ) α). A test δ is optimal if it minimizes L(δ, λ) (maximizes L(δ, λ)), λ 0, P F (δ) α, and λ(α P F (δ)) = 0. L(δ, λ) = f(y )dy + λα λ f(y )dy Y 1 Y 1 = [f(y ) λf(y )] dy + λα. Y 1 L(δ, λ) is maximized when 1 if f(y ) > λf(y ) δ(y) = 0 if f(y ) < λf(y ) 0 or 1 if f(y ) = λf(y ) = 1 if L(y) > λ 0 if L(y) < λ 0 or 1 if L(y) = λ Thus, δ has to be an LRT. λ must satisfy the KKT condition. 23

Let F L (l ) P (L l ), CDF of the LR L = L(y) under. Let f 0 F L (0 ) = P (L = 0 ). Define 2 tests: δ L,λ (y) = 1 if L(y) > λ 0 if L(y) λ δ U,λ (y) = Case 1: If 1 α < f 0, let λ = 0 and δ NP = δ L,0. 1 if L(y) λ 0 if L(y) < λ Case 2: If 1 α f 0 and there exists a λ such that F L (λ ) = 1 α, i.e, 1 α is in the range of F L (l ), choose this λ as the LRT threshold and let δ NP = δ L,λ. Case 3: If 1 α f 0 and 1 α is not in the range of F L (l ), i.e., there is a discontinuity point λ > 0 of F L (l ) such that F L (λ ) < 1 α < F L (λ ), Choose this λ as the LRT threshold, the NP test is the randomized 24

test: Choose δ U,λ with probability p and δ L,λ with probability 1 p, equivalently, 1 if L(y) > λ 0 if L(y) < λ δ NP =. 1 w.p. p; if L(y) = λ 0 w.p. 1 p Comments: 1. When Y is discrete, F L (l ) is discontinuous, thus, randomized test is usually needed. 2. Similarly, we could consider the minimization of P F under the constraint P M (δ) β. Similar solution can be obtained. This problem is called an NP test of Type II. The previously discussed one is called an NP test of Type I. Example: 25

2.5.1. ROC Properties Finding ROC is naturally the NP test problem, which must be an LRT. L(y) τ. P D (τ) = τ f L (l )dl P F (τ) = τ f L (l )dl (2) As τ varies from 0 to, (P F (δ), P D (δ)) moves continuously along the ROC curve. 1. Let τ = 0. Thus δ 1 (y) = 1 always and P D (δ) = P F (δ) = 1. (1, 1) belongs to the ROC. 2. Let τ =. Thus δ 1 (y) = 0 always and P D (δ) = P F (δ) = 0. (0, 0) belongs to the ROC. 3. The slope of the ROC at point (P F (τ), P D (δ)) equals to τ. 26

4. The ROC curve is concave, i.e., the domain of the achievable pairs (P F, P D ) is convex. 5. All points on the ROC curve satisfy P D P F. 6. The region of feasible tests is symmetric about the point (1/2, 1/2), i.e., if (P F, P D ) is feasible, so is (1 P F, 1 P D ). Example: 27