ECE531 Lecture 4b: Composite Hypothesis Testing

Similar documents
ECE531 Screencast 11.5: Uniformly Most Powerful Decision Rules

ECE531 Screencast 11.4: Composite Neyman-Pearson Hypothesis Testing

ECE531 Lecture 6: Detection of Discrete-Time Signals with Random Parameters

ECE531 Lecture 2b: Bayesian Hypothesis Testing

ECE531 Screencast 9.2: N-P Detection with an Infinite Number of Possible Observations

ECE531 Lecture 2a: A Mathematical Model for Hypothesis Testing (Finite Number of Possible Observations)

ECE531 Lecture 13: Sequential Detection of Discrete-Time Signals

ECE531 Lecture 8: Non-Random Parameter Estimation

Chapter 2. Binary and M-ary Hypothesis Testing 2.1 Introduction (Levy 2.1)

Hypothesis Testing. 1 Definitions of test statistics. CB: chapter 8; section 10.3

ECE531 Lecture 10b: Maximum Likelihood Estimation

Detection theory 101 ELEC-E5410 Signal Processing for Communications

Lecture Testing Hypotheses: The Neyman-Pearson Paradigm

If there exists a threshold k 0 such that. then we can take k = k 0 γ =0 and achieve a test of size α. c 2004 by Mark R. Bell,

Lecture 5: Likelihood ratio tests, Neyman-Pearson detectors, ROC curves, and sufficient statistics. 1 Executive summary

Definition 3.1 A statistical hypothesis is a statement about the unknown values of the parameters of the population distribution.

10. Composite Hypothesis Testing. ECE 830, Spring 2014

STAT 830 Hypothesis Testing

2. What are the tradeoffs among different measures of error (e.g. probability of false alarm, probability of miss, etc.)?

Composite Hypotheses and Generalized Likelihood Ratio Tests

Detection and Estimation Chapter 1. Hypothesis Testing

Detection and Estimation Theory

Hypothesis Testing - Frequentist

Lecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable

EE 574 Detection and Estimation Theory Lecture Presentation 8

Lecture 12 November 3

Introduction to Statistical Inference

ECE531: Principles of Detection and Estimation Course Introduction

Lecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable

DETECTION theory deals primarily with techniques for

STAT 830 Hypothesis Testing

Lecture 7 Introduction to Statistical Decision Theory

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 3 9/10/2008 CONDITIONING AND INDEPENDENCE

Detection Theory. Chapter 3. Statistical Decision Theory I. Isael Diaz Oct 26th 2010

Mathematical Statistics

Chapter 2: Random Variables

Introduction to Bayesian Statistics

Probability. Lecture Notes. Adolfo J. Rumbos

Lecture 7. Sums of random variables

1 Random variables and distributions

Chapter 9: Hypothesis Testing Sections

Testing Statistical Hypotheses

Announcements. Proposals graded

ECE531 Homework Assignment Number 2

DS-GA 1002 Lecture notes 11 Fall Bayesian statistics

40.530: Statistics. Professor Chen Zehua. Singapore University of Design and Technology

Detection and Estimation Theory

Chapter 6. Hypothesis Tests Lecture 20: UMP tests and Neyman-Pearson lemma

IN HYPOTHESIS testing problems, a decision-maker aims

Detection and Estimation Theory

Hypothesis Testing. Testing Hypotheses MIT Dr. Kempthorne. Spring MIT Testing Hypotheses

Derivation of Monotone Likelihood Ratio Using Two Sided Uniformly Normal Distribution Techniques

Economics 520. Lecture Note 19: Hypothesis Testing via the Neyman-Pearson Lemma CB 8.1,

On the Optimality of Likelihood Ratio Test for Prospect Theory Based Binary Hypothesis Testing

Machine Learning

Statistics: Learning models from data

Lecture 2: Basic Concepts of Statistical Decision Theory

LECTURE NOTES 57. Lecture 9

Lecture 8: Information Theory and Statistics

Ch. 8 Math Preliminaries for Lossy Coding. 8.5 Rate-Distortion Theory

Hypothesis Test. The opposite of the null hypothesis, called an alternative hypothesis, becomes

Stephen Scott.

Summary of Chapters 7-9

Review of Probability Theory

1: PROBABILITY REVIEW

Probability Theory and Statistics. Peter Jochumzen

HST.582J / 6.555J / J Biomedical Signal and Image Processing Spring 2007

Lecture 7. Union bound for reducing M-ary to binary hypothesis testing

Testing Hypothesis. Maura Mezzetti. Department of Economics and Finance Università Tor Vergata

Machine Learning

Announcements. Proposals graded

Lecture 8: Information Theory and Statistics

Review: mostly probability and some statistics

Lecture Notes 3 Multiple Random Variables. Joint, Marginal, and Conditional pmfs. Bayes Rule and Independence for pmfs

Chapter 6 Expectation and Conditional Expectation. Lectures Definition 6.1. Two random variables defined on a probability space are said to be

Hypothesis Testing. A rule for making the required choice can be described in two ways: called the rejection or critical region of the test.

Lecture 21. Hypothesis Testing II

STAT 801: Mathematical Statistics. Hypothesis Testing

Homework Assignment #2 for Prob-Stats, Fall 2018 Due date: Monday, October 22, 2018

Sequential Procedure for Testing Hypothesis about Mean of Latent Gaussian Process

Testing Statistical Hypotheses

Introduction to Bayesian Learning. Machine Learning Fall 2018

Tests and Their Power

Bayesian decision theory Introduction to Pattern Recognition. Lectures 4 and 5: Bayesian decision theory

Fundamentals of Statistical Signal Processing Volume II Detection Theory

simple if it completely specifies the density of x

Classification and Regression Trees

Direction: This test is worth 250 points and each problem worth points. DO ANY SIX

A Very Brief Summary of Statistical Inference, and Examples

Notes on Decision Theory and Prediction

Detection theory. H 0 : x[n] = w[n]

F79SM STATISTICAL METHODS

CS 125 Section #10 (Un)decidability and Probability November 1, 2016

Hypothesis testing (cont d)

Decision theory. 1 We may also consider randomized decision rules, where δ maps observed data D to a probability distribution over

F2E5216/TS1002 Adaptive Filtering and Change Detection. Course Organization. Lecture plan. The Books. Lecture 1

H 2 : otherwise. that is simply the proportion of the sample points below level x. For any fixed point x the law of large numbers gives that

7 Gaussian Discriminant Analysis (including QDA and LDA)

Measure and integration

Recitation 2: Probability

Transcription:

ECE531 Lecture 4b: Composite Hypothesis Testing D. Richard Brown III Worcester Polytechnic Institute 16-February-2011 Worcester Polytechnic Institute D. Richard Brown III 16-February-2011 1 / 44

Introduction Suppose we have the following discrete-time signal detection problem: state x 0 transmitter sends s 0,i = 0 state x 1 transmitter sends s 1,i = asin(2πωk + φ) for sample index i = 0,..., n 1. We observe these signals in additive noise Y i = s j,i + W i for sample index i = 0,...,n 1 and j {0,1} and we wish to optimally decide H 0 x 0 versus H 1 x 1. We know how to do this if a, ω, and φ are known: simple binary hypothesis testing problem (Bayes, minimax, N-P). What if one or more of these parameters is unknown? We will have a composite binary hypothesis testing problem. Worcester Polytechnic Institute D. Richard Brown III 16-February-2011 2 / 44

Example: Y is Gaussian distributed with known variance σ 2 but unknown mean µ R. Given an observation Y = y, we want to decide if µ > µ 0. Only two hypotheses here: H 0 : µ µ 0 and H 1 : µ > µ 0. What is the state? This is a binary composite hypothesis testing problem with an uncountably infinite number of states. Worcester Polytechnic Institute D. Richard Brown III 16-February-2011 3 / 44 Introduction Recall that simple hypothesis testing problems have one state per hypothesis. Composite hypothesis testing problems have at least one hypothesis containing more than one state. H0 p x (y) H1 decision rule states observations hypotheses

Mathematical Model and Notation Let X be the set of states. X can now be finite, e.g. {0,1}, countably infinite, e.g. {0,1,... }, or uncountably infinite, e.g. R 2. We still assume a finite number of hypotheses H 0,...,H M 1 with M <. As before, hypotheses are defined as a partition on X. Composite HT: At least one hypothesis contains more than one state. If X is infinite (countably or uncountably), at least one hypothesis must contain an infinite number of states. Let C (x) = [C 0,x,...,C M 1,x ] be the vector of costs associated with making decision i Z = {0,...,M 1} when the state is x. Let Y be the set of observations. Each state x X has an associated conditional pmf or pdf p x (y) that specifies how states are mapped to observations. These conditional pmfs/pdfs are, as always, assumed to be known. Worcester Polytechnic Institute D. Richard Brown III 16-February-2011 4 / 44

The Uniform Cost Assignment in Composite HT Problems In general, for i = 0,...,M 1 and x X, the UCA is given as { 0 if x H i C i (x) = 1 otherwise. Example: A random variable Y is Gaussian distributed with known variance σ 2 but unknown mean µ R. Given an observation Y = y, we want to decide if µ > µ 0. Hypotheses H 0 : µ µ 0 and H 1 : µ > µ 0. Uniform cost assignment (x = µ): C (µ) = [C 0 (µ),c 1 (µ)] = { [0,1] if µ µ 0 [1,0] if µ > µ 0 Other cost structures, e.g. squared error, are possible in composite HT problems, of course. Worcester Polytechnic Institute D. Richard Brown III 16-February-2011 5 / 44

Conditional Risk Given decision rule ρ, the conditional risk for state x X can be written as R x (ρ) = ρ (y)c(x)p x (y)dy Y = E[ρ (Y )C(x) state is x] = E x [ρ (Y )C(x)] where the last expression is a common shorthand notation used in many textbooks. Recall that the decision rule ρ (y) = [ρ 0 (y),...,ρ M 1 (y)] where 0 ρ i (y) 1 specifies the probability of deciding H i when you observe Y = y. Also recall that i ρ i(y) = 1 for all y Y. Worcester Polytechnic Institute D. Richard Brown III 16-February-2011 6 / 44

Bayes Risk: Countable States When X is countable, we denote the prior probability of state x as π x. The Bayes risk of decision rule ρ in this case can be written as r(ρ,π) = x X π x R x (ρ) = π x ρ (y)c(x)p x (y)dy x X Y ( ) = ρ (y) C(x)π x p x (y) Y x X }{{} g(y,π)=[g 0 (y,π),...,g M 1 (y,π)] dy How should we specify the Bayes decision rule ρ(y) = δ Bπ (y) in this case? Worcester Polytechnic Institute D. Richard Brown III 16-February-2011 7 / 44

Bayes Risk: Uncountable States When X is uncountable, we denote the prior density of the states as π(x). The Bayes risk of decision rule ρ in this case can be written as r(ρ, π) = π(x)r x (ρ)dx X = π(x) ρ (y)c(x)p x (y)dy dx X Y ( ) = ρ (y) C(x)π(x)p x (y)dx dy Y X }{{} g(y,π)=[g 0 (y,π),...,g M 1 (y,π)] How should we specify the Bayes decision rule ρ(y) = δ Bπ (y) in this case? Worcester Polytechnic Institute D. Richard Brown III 16-February-2011 8 / 44

Bayes Decision Rules for Composite HT Problems It doesn t matter whether we have a finite, countably infinite, or uncountably infinite number of states, we can always find a deterministic Bayes decision rule as δ Bπ (y) = arg min g i(y,π) i {0,...,M 1} where g i (y,π) = N 1 j=0 C i,jπ j p j (y) when X = {x 0,...,x N 1 } is finite x X C i(x)π x p x (y) when X is countably infinite X C i(x)π(x)p x (y)dx when X is uncountably infinite When we have only two hypotheses, we just have to compare g 0 (y,π) with g 1 (y,π) at each value of y. We decide H 1 when g 1 (y,π) < g 0 (y,π), otherwise we decide H 0. Worcester Polytechnic Institute D. Richard Brown III 16-February-2011 9 / 44

Composite Bayes Hypothesis Testing Example Suppose X = [0,3) and we want to decide between three hypotheses H 0 : 0 x < 1 H 1 : 1 x < 2 H 2 : 2 x < 3 We get two observations Y 0 = x + η 0 and Y 1 = x + η 1, where η 0 and η 1 are i.i.d. and uniformly distributed on [ 1,1]. What is the observation space Y? What is the conditional distribution of each observation? p x (y k ) = { 1 2 x 1 y k x + 1 0 otherwise What is the conditional distribution of the vector observation? Worcester Polytechnic Institute D. Richard Brown III 16-February-2011 10 / 44

Composite Bayes Hypothesis Testing Example y1 x y0 x Worcester Polytechnic Institute D. Richard Brown III 16-February-2011 11 / 44

Composite Bayes Hypothesis Testing Example Assume the UCA. What does our cost vector C(x) look like? Let s also assume a uniform prior on the states, i.e. π(x) = { 1 3 0 x < 3 0 otherwise Do we have everything we need to find the Bayes decision rule? Yes! Worcester Polytechnic Institute D. Richard Brown III 16-February-2011 12 / 44

Composite Bayes Hypothesis Testing Example We want to find the index corresponding to the minimum of three discriminant functions: Note that where h 0 (y,π) = 1 0 g 0 (y,π) = 1 3 g 1 (y,π) = 1 3 g 2 (y,π) = 1 3 3 1 1 0 2 0 p x (y)dx p x (y)dx + 1 3 p x (y)dx arg min i {0,1,2} g i(y,π) p x (y)dx h 1 (y,π) = 2 1 3 2 p x (y)dx arg max i {0,1,2} h i(y,π) p x (y)dx h 2 (y,π) = 3 2 p x (y)dx. Worcester Polytechnic Institute D. Richard Brown III 16-February-2011 13 / 44

Composite Bayes Hypothesis Testing Example Note that y R 2 is fixed and we are integrating with respect to x R here. Suppose y = [0.2,2.1] (the red dot in the following 3 plots). y1 y1 y1 y0 y0 y0 Which hypothesis must be true? Worcester Polytechnic Institute D. Richard Brown III 16-February-2011 14 / 44

Composite Bayes Hypothesis Testing Example Suppose y = [0.8,1.1] (the red dot in the following 3 plots). y1 y1 y1 y0 y0 y0 Which hypothesis would you choose? Hopefully not H 2! Worcester Polytechnic Institute D. Richard Brown III 16-February-2011 15 / 44

Composite Bayes Hypothesis Testing Example y1 H 2 H 1 H 0 y0 δ Bπ = 0 if y 0 + y 1 2 1 if 2 < y 0 + y 1 4 2 otherwise. Worcester Polytechnic Institute D. Richard Brown III 16-February-2011 16 / 44

Composite Neyman-Pearson Hypothesis Testing: Basics When the problem is simple and binary, recall that P fp (ρ) := Prob(ρ decides 1 state is x 0 ) P fn (ρ) := Prob(ρ decides 0 state is x 1 ) = 1 P D. The N-P decision rule is then given as = max P D (ρ) ρ subject to P fp (ρ) α ρ NP where P D := 1 P fn = Prob(ρ decides 1 state is x 1 ). The N-P criterion seeks a decision rule that maximizes the probability of detection subject to the constraint that the false positive probability α. Now suppose the problem is binary but not simple. The states can be finite (more than 2), countably infinite, or uncountably infinite. How can we extend our ideas of N-P HT to composite hypotheses? Worcester Polytechnic Institute D. Richard Brown III 16-February-2011 17 / 44

Composite Neyman-Pearson Hypothesis Testing Hypotheses H 0 and H 1 are associated with the subsets of states X 0 and X 1, where X 0 and X 1 form a partition on X. Given a state x X 0, we can write the conditional false-positive probability as P fp,x (ρ) := Prob(ρ decides 1 state is x). Given a state x X 1, we can write the conditional false-negative probability as P fn,x (ρ) := Prob(ρ decides 0 state is x). or the conditional probability of detection as P D,x (ρ) := Prob(ρ decides 1 state is x) = 1 P fn,x (ρ). The binary composite N-P HT problem is then ρ NP = arg maxp D,x (ρ) for all x X 1 ρ subject to the constraint P fp,x (ρ) α for all x X 0. Worcester Polytechnic Institute D. Richard Brown III 16-February-2011 18 / 44

Composite Neyman-Pearson Hypothesis Testing Remarks: Recall that the decision rule does not know the state. It only knows the observation : ρ : Y P M. We are trying to find one decision rule ρ that maximizes the probability of detection for every state x X 1 subject to the constraints P fp,x (ρ) α for every x X 0. The composite N-P HT problem is a multi-objective optimization problem. Uniformly most powerful (UMP) decision rule: When a solution exists for the binary composite N-P HT problem, the optimum decision rule ρ NP is UMP (one decision rule gives the best P D,x for all x X 1 subject to the constraints). Although UMP decision rules are very desirable, they only exist in some special cases. Worcester Polytechnic Institute D. Richard Brown III 16-February-2011 19 / 44

Some Intuition about UMP Decision Rules Suppose we have a binary composite hypothesis testing problem where the null hypothesis H 0 is simple and the alternative hypothesis is composite, i.e. i.e. X 0 = x 0 X and X 1 = X \x 0. Now pick a state x 1 X 1 with conditional density p 1 (y) and associate a new hypothesis H 1 with just this one state. The problem of deciding between H 0 and H 1 is a simple binary HT problem. The N-P lemma tells us that we can find a decision rule 1 p 1 (y) > v p 0 (y) ρ (y) = γ p 1 (y) = v p 0 (y) 0 p 1 (y) < v p 0 (y) where v and γ are selected so that P fp = α. No other α-level decision rule can be more powerful for deciding between H 0 and H 1. Worcester Polytechnic Institute D. Richard Brown III 16-February-2011 20 / 44

Some Intuition about UMP Decision Rules (continued) Now pick another state x 2 X 1 with conditional density p 2 (y) and associate a new hypothesis H 1 with just this one state. The problem of deciding between H 0 and H 1 is also a simple binary HT problem. The N-P lemma tells us that we can find a decision rule 1 p 2 (y) > v p 0 (y) ρ (y) = γ p 2 (y) = v p 0 (y) 0 p 2 (y) < v p 0 (y) where v and γ are selected so that P fp = α. No other α-level decision rule can be more powerful for deciding between H 0 and H 1. The decision rule ρ cannot be more powerful than ρ for deciding between H 0 and H 1. Likewise, the decision rule ρ cannot be more powerful than ρ for deciding between H 0 and H 1. Worcester Polytechnic Institute D. Richard Brown III 16-February-2011 21 / 44

Some Intuition about UMP Decision Rules (the payoff) When might ρ and ρ have the same power for deciding between H 0 /H 1 and H 0 /H 1? Only when the critical region Y 1 = {y Y : ρ(y) decides H 1 } is the same (except possibly on a set of probability zero). In our example, we need {y Y : p 1 (y) > v p 0 (y)} = {y Y : p 2 (y) > v p 0 (y)} Actually, we need this to be true for all of the states x X 1. Theorem A UMP decision rule exists for the α-level binary composite HT problem H 0 x 0 versus H 1 X \x 0 = X 1 if and only if the critical region {y Y : ρ(y) decides H 1 } of the simple binary HT problem H 0 x 0 versus H 1 x 1 is the same for all x 1 X 1. Worcester Polytechnic Institute D. Richard Brown III 16-February-2011 22 / 44

Example ECE531 Lecture 4b: Composite Hypothesis Testing Suppose we have the composite binary HT problem H 0 : x = µ 0 H 1 : x > µ 0 for X = [µ 0, ) and Y N(x, σ 2 ). Pick µ 1 > µ 0. We already know how to find the α-level N-P decision rule for the simple binary problem H 0 : x = µ 0 H 1 : x = µ 1 The N-P decision rule for H 0 versus H 1 is simply 1 y > 2σerfc 1 (2α) + µ 0 ρ NP (y) = γ y = 2σerfc 1 (2α) + µ 0 0 y < (1) 2σerfc 1 (2α) + µ 0 where γ is arbitrary since P fp = Prob(y > 2σerfc 1 (2α) + µ 0 x = µ 0 ) = α. Worcester Polytechnic Institute D. Richard Brown III 16-February-2011 23 / 44

Example ECE531 Lecture 4b: Composite Hypothesis Testing Note that the critical region {y Y : y > 2σerfc 1 (2α) + µ 0 } does not depend on µ 1. Thus, by the theorem, (1) is a UMP decision rule for this binary composite HT problem. Also note that the probability of detection for this problem can be expressed as P D (ρ NP ) = Prob (y > ) 2σerfc 1 (2α) + µ 0 x = µ 1 ( ) 2σerfc 1 (2α) + µ 0 µ 1 = Q σ does depend on µ 1, as you might expect. But there is no α-level decision rule that gives a better probability of detection (power) than ρ NP for any state x X 1. Worcester Polytechnic Institute D. Richard Brown III 16-February-2011 24 / 44

Example What if we had this composite HT problem instead? for X = R and Y N(x,σ 2 ). H 0 : x = µ 0 H 1 : x µ 0 For µ 1 < µ 0, the most powerful critical region for an α-level decision rule is {y Y : y < µ 0 2σerfc 1 (2α)} Does not depend on µ 1. Excellent! Hold on. This is not the same critical region as when µ 1 > µ 0. So, in fact, the critical region does depend on µ 1. We can conclude that no UMP decision rule exists for this binary composite HT problem. The theorem is handy for both finding UMP decision rules and also showing when UMP decision rules can t be found. Worcester Polytechnic Institute D. Richard Brown III 16-February-2011 25 / 44

Monotone Likelihood Ratio Despite UMP decision rules only being available in special cases, lots of real problems fall into these special cases. One such class of problems is the class of binary composite HT problems with monotone likelihood ratios. Definition (Lehmann, Testing Statistical Hypotheses, p.78) The real-parameter family of densities p x (y) for x X R is said to have monotone likelihood ratio if there exists a real-valued function T(y) that only depends on y such that, for any x 1 X and x 0 X with x 0 < x 1, the distributions p x0 (y) and p x1 (y) are distinct and the ratio L x1 /x 0 (y) := p x 1 (y) p x0 (y) = f(x 0,x 1,T(y)) is a non-decreasing function of T(y). Worcester Polytechnic Institute D. Richard Brown III 16-February-2011 26 / 44

Monotone Likelihood Ratio: Example Consider the Laplacian density p x (y) = b 2 e b y x with b > 0 and X = [0, ). Let s see if this family of densities has monotone likelihood ratio... L x1 /x 0 (y) := p x 1 (y) p x0 (y) = eb( y x 0 y x 1 ) Since b > 0, L x1 /x 0 (y) will be non-decreasing in T(y) = y provided that y x 0 y x 1 is non-decreasing in y for all 0 x 0 < x 1. We can write x 0 x 1 y < x 0 y x 0 y x 1 = 2y x 1 x 0 x 0 y x 1 x 1 x 0 y > x 1. Note that x 0 x 1 < 0 is a constant and x 1 x 0 > 0 is also a constant with respect to y. Hence we can see that the Laplacian family of densities p x (y) = b 2 e b y x with b > 0 and X = [0, ) is monotone in T(y) = y. Worcester Polytechnic Institute D. Richard Brown III 16-February-2011 27 / 44

Existence of a UMP Decision Rule for Monotone LR Theorem (Lehmann, Testing Statistical Hypotheses, p.78) Let X be a subinterval of the real line. Fix λ X and define the hypotheses H 0 : x X 0 = {x λ} versus H 1 : x X 1 = {x > λ}. If the family of densities p x (y) are distinct for all x X and has monotone likelihood ratio in T(y), then the decision rule 1 T(y) > τ ρ(y) = γ T(y) = τ 0 T(y) < τ where τ and γ are selected so that P fp,x=λ = α is UMP for testing H 0 : x X 0 versus H 1 : x X 1. Worcester Polytechnic Institute D. Richard Brown III 16-February-2011 28 / 44

Existence of a UMP Decision Rule: Finite Real States Corollary (Finite Real States) Let X = {x 0 < x 1 < < x N 1 } R and partition X into two order preserving sets X 0 = {x 0,...,x j } and X 1 = {x j+1,...,x N 1 }. Then the α-level N-P decision rule for deciding between the simple hypotheses H 0 : x = x j versus H 1 : x = x j+1 is a UMP α-level test for deciding between the composite hypotheses H 0 : x X 0 versus H 1 : x X 1. Intuition: Suppose X 0 = {x 0,x 1 } and X 1 = {x 2,x 3 }. We can build a table of α-level N-P decision rules for the four simple binary HT problems: H 0 : x = x 0 H 0 : x = x 1 H 1 : x = x 2 H 1 : x = x 3 ρ NP 02 ρ NP 03 ρ NP 12 ρ NP 13 The theorem says that ρ NP 12 HT problem. Why? a UMP α-level decision rule for the composite Worcester Polytechnic Institute D. Richard Brown III 16-February-2011 29 / 44

Coin Flipping Example Suppose we have a coin with Prob(heads) = x, where 0 x 1 is the unknown state. We flip the coin n times and observe the number of heads Y = y {0,...,n}. We know that, conditioned on the state x, the observation Y is distributed as ( ) n p x (y) = x y (1 x) n y. y Fix 0 < λ < 1 and define our composite binary hypotheses as H 0 : Y p x (y) for x [0,λ] = X 0 H 1 : Y p x (y) for x (λ,1] = X 1 Does there exist a UMP decision rule with significance level α? Are p x (y) distinct for all x X? Yes. Does this family of densities have monotone likelihood ratio in some test statistic T(y)? Let s check... Worcester Polytechnic Institute D. Richard Brown III 16-February-2011 30 / 44

Coin Flipping Example L x1 /x 0 (y) = ( n y) x y 1 (1 x 1) n y ( n y) x y 0 (1 x 0) n y = φ y ( 1 x1 1 x 0 where φ := x 1(1 x 0 ) x 0 (1 x 1 ) > 1 when x 1 > x 0. Hence this family of densities has monotone likelihood ratio in the test statistic T(y) = y. We don t have a finite number of states. How can we find the UMP decision rule? According the the theorem, a UMP decision rule will be 1 y > τ ρ(y) = γ y = τ 0 y < τ with τ and γ selected so that P fp,x=λ = α. We just have to find τ and γ. Worcester Polytechnic Institute D. Richard Brown III 16-February-2011 31 / 44 ) n

Coin Flipping Example To find τ, we can use our procedure from simple binary N-P hypothesis testing. We want to find the smallest value of τ such that P fp,x=λ (δ τ ) = n ( ) n Prob[y = j x = λ] = λ j (1 λ) n j α. }{{} j j>τ j>τ p λ (y) Once we find the appropriate value of τ, if P fp,x=λ (δ τ ) = α, then we are done. We can set γ to any arbitrary value in [0,1] in this case. If P fp,x=λ (δ τ ) < α, then we have to find γ such that P fp,x=λ (ρ) = α. The UMP decision rule is just the usual randomization ρ = (1 γ)δ τ + γδ τ ǫ and, in this case, γ = α n ( n ) j=τ+1 j λ j (1 λ) n j ( n τ) λ τ (1 λ) n τ. Worcester Polytechnic Institute D. Richard Brown III 16-February-2011 32 / 44

Power Function For binary composite hypothesis testing problems with X a subinterval of the real line and a particular decision rule ρ, we can define the power function of ρ as β(x) := Prob(ρ decides 1 state is x) For each x X 1, β(x) is the probability of a true positive (the probability of detection). For each x X 0, β(x) is the probability of a false positive. Hence, a plot of β(x) versus x displays both the probability of a false positive and the probability of detection of ρ for all states x X. Our UMP decision rule for the coin flipping problem specifies that we always decide 1 when we observe τ + 1 or more heads, and we decide 1 with probability γ if we observe τ heads. Hence, β(x) = n j=τ+1 ( n j ) x j (1 x) n j + γ ( ) n x τ (1 x) n τ τ Worcester Polytechnic Institute D. Richard Brown III 16-February-2011 33 / 44

Coin Flipping Example: Power Function λ = 0.5, α = 0.1 1 0.9 0.8 n=2 n=4 n=6 n=8 n=10 0.7 power function β(x) 0.6 0.5 0.4 0.3 0.2 0.1 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 state x Worcester Polytechnic Institute D. Richard Brown III 16-February-2011 34 / 44

Locally Most Powerful Decision Rules Although UMPs exists in many cases of practical interest, we ve already seen that they don t always exist. If a UMP decision rule does not exist, an alternative is to seek the most powerful decision rule when x is very close to the threshold between X 0 and X 1. If such a decision rule exists, it is said to be a locally most powerful (LMP) decision rule. Worcester Polytechnic Institute D. Richard Brown III 16-February-2011 35 / 44

Locally Most Powerful Decision Rules: Setup Consider the situation where X = [λ 0,λ 1 ] R for some λ 1 > λ 0. We wish to test H 0 : x = λ 0 versus H 1 : λ 0 < x λ 1 subject to P fp,x=λ0 α. For x = λ 0, the power function β(x) corresponds to the probability of false positive; for x > λ 0, the power function β(x) corresponds to the probability of detection. We assume that β(x) is differentiable at x = λ 0. Then β (λ 0 ) = d dx β(x) x=λ 0 is the slope of β(x) at x = λ 0. Taylor series expansion of β(x) around x = λ 0 β(x) = β(λ 0 ) + β (λ 0 )(x λ 0 ) +... Our goal is to find a decision rule ρ that maximizes the slope β (λ 0 ) subject to β(λ 0 ) α. This decision rule is LMP of size α. Why? Worcester Polytechnic Institute D. Richard Brown III 16-February-2011 36 / 44

Locally Most Powerful Decision Rules: Solution Definition A decision rule ρ is a locally most powerful decision rule of size α at x = λ 0 if, for any other size α decision rule ρ, β ρ (λ 0) β ρ (λ 0). The LMP decision rule ρ takes the form 1 L λ 0 (y) > τ ρ(y) = γ 0 L λ 0 (y) = τ L λ 0 (y) < τ (see Kay II Section 6.7 for the details) where L λ 0 (y) := d dx L x/λ 0 (y) x=λ0 = d dx p x(y) x=λ0 p λ0 (y) and where τ and γ are selected so that P fp,x=λ0 = β(λ 0 ) = α. Worcester Polytechnic Institute D. Richard Brown III 16-February-2011 37 / 44

LMP Decision Rule Example Consider a Laplacian random variable Y with unknown mean x [0, ). Given one observation of Y, we want to decide H 0 : x = 0 versus H 1 : x > 0 subject to an upper bound α on the false positive probability. The conditional density of Y is p x (y) = b 2 e b y x. The likelihood ratio can be written as L x/0 (y) = p x(y) p 0 (y) = eb( y y x ) A UMP decision rule actually does exist here. To find an LMP decision rule, we need to compute L 0(y) = d dx L x/0(y) x=0 + Worcester Polytechnic Institute D. Richard Brown III 16-February-2011 38 / 44

LMP Decision Rule Example L 0(y) = d dx L x/0(y) x=0 + = d dx eb( y y x ) x=0 + = bsgn(y) Note that L 0 (y) doesn t exist when y = 0, but Prob[Y = 0] = 0 so this doesn t really matter. We need to find the decision threshold τ on L 0 (y). Note that L 0 (y) can take on only two values: L 0 (y) { b,b}. Hence we only need to consider a finite number of decision thresholds τ { 2b, b,b,2b}. τ = 2b corresponds to what decision rule? Always decide H 1. τ = 2b corresponds to what decision rule? Always decide H 0. Worcester Polytechnic Institute D. Richard Brown III 16-February-2011 39 / 44

LMP Decision Rule Example When x = 0, p x (y) = b 2 e b y. When x = 0 we get a false positive with probability 1 if bsgn(y) > τ and with probability γ if bsgn(y) = τ. False positive probability as a function of the threshold τ on L 0 (y). τ = 2b b b 2b P fp = 1 0 From this, it is easy to determine the required values for τ and γ as a function of α. α = 0 τ = 2b γ = arbitrary 0 < α 1 2 τ = b γ = 2α 1 2 < α < 1 τ = b γ = 2α 1 α = 1 τ = 2b γ = arbitrary Worcester Polytechnic Institute D. Richard Brown III 16-February-2011 40 / 44

LMP Decision Rule Example The power function of the LMP decision rule for 0 < α < 1 2 is β(x) = Prob(sgn(y) > 1 state is x) + γprob(sgn(y) = 1 state is x) b = γ 0 2 e b y x dy ( x b ) b = 2α 0 2 e b y x dy + x 2 e b y x dy ( 1 e bx = 2α + 1 ) 2 2 = 2α αe bx Worcester Polytechnic Institute D. Richard Brown III 16-February-2011 41 / 44

LMP Decision Rule Example: α = 0.1, b = 1 1 0.9 UMP LMP 0.8 0.7 power function β(x) 0.6 0.5 0.4 0.3 0.2 0.1 0 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 state x Worcester Polytechnic Institute D. Richard Brown III 16-February-2011 42 / 44

LMP Remarks Note that Kay develops the notion of an LMP decision rule for the one-sided parameter test where H 0 : x = λ 0 versus H 1 : x > λ 0. The usefulness of the LMP decision rule in this case should be clear: The slope of the power function β(x) is maximized for states in the alternative hypothesis close to x = λ 0. The false-positive probability is guaranteed to be less than or equal to α for all states in the null hypothesis (there is only one state in the null hypothesis). When we have the one-sided parameter test H 0 : x λ 0 versus H 1 : x > λ 0 with a composite null hypothesis, the same LMP approach can be applied but you must check that P fp,x α for all x λ 0. The slope of the power function β(x) is still maximized for states in the alternative hypothesis close to x = λ 0. The false-positive probability of the LMP decision rule is not necessarily less than or equal to α for all states in the null hypothesis. You must check this. Worcester Polytechnic Institute D. Richard Brown III 16-February-2011 43 / 44

Conclusions Simple hypothesis testing is not sufficient for detection problems with unknown parameters. These types of problems require composite hypothesis testing. Bayesian M-ary composite hypothesis testing. Basically the same as Bayesian M-ary simple hypothesis testing: find the cheapest commodity (minimum discriminant function). Neyman Pearson binary composite hypothesis testing. Multi-objective optimization problem. Would like to find uniformly most powerful (UMP) decision rules. This is possible in some cases: Monotone likelihood ratio Other cases discussed in Lehmann Testing Statistical Hypotheses Locally most powerful (LMP) decision rules can often be found in cases when UMP decision rules do not exist. Worcester Polytechnic Institute D. Richard Brown III 16-February-2011 44 / 44