Lecture 5: Likelihood ratio tests, Neyman-Pearson detectors, ROC curves, and sufficient statistics. 1 Executive summary

Similar documents
2. What are the tradeoffs among different measures of error (e.g. probability of false alarm, probability of miss, etc.)?

Lecture 8: Signal Detection and Noise Assumption

Composite Hypotheses and Generalized Likelihood Ratio Tests

Detection theory. H 0 : x[n] = w[n]

10. Composite Hypothesis Testing. ECE 830, Spring 2014

Topic 3: Hypothesis Testing

PATTERN RECOGNITION AND MACHINE LEARNING

Economics 520. Lecture Note 19: Hypothesis Testing via the Neyman-Pearson Lemma CB 8.1,

DETECTION theory deals primarily with techniques for

Introduction to Statistical Inference

Lecture 12 November 3

Detection theory 101 ELEC-E5410 Signal Processing for Communications

F2E5216/TS1002 Adaptive Filtering and Change Detection. Course Organization. Lecture plan. The Books. Lecture 1

ECE531 Lecture 6: Detection of Discrete-Time Signals with Random Parameters

Introduction to Detection Theory

ELEG 5633 Detection and Estimation Signal Detection: Deterministic Signals

If there exists a threshold k 0 such that. then we can take k = k 0 γ =0 and achieve a test of size α. c 2004 by Mark R. Bell,

Chapter 2. Binary and M-ary Hypothesis Testing 2.1 Introduction (Levy 2.1)

10-704: Information Processing and Learning Fall Lecture 24: Dec 7

Detection and Estimation Theory

Lecture 22: Error exponents in hypothesis testing, GLRT

ECE531 Screencast 9.2: N-P Detection with an Infinite Number of Possible Observations

Hypothesis Test. The opposite of the null hypothesis, called an alternative hypothesis, becomes

Bayesian Decision Theory

ECE531 Lecture 2b: Bayesian Hypothesis Testing

Lecture 7 Introduction to Statistical Decision Theory

Chapter 2 Signal Processing at Receivers: Detection Theory

Testing Hypothesis. Maura Mezzetti. Department of Economics and Finance Università Tor Vergata

Lecture 3. STAT161/261 Introduction to Pattern Recognition and Machine Learning Spring 2018 Prof. Allie Fletcher

STOCHASTIC PROCESSES, DETECTION AND ESTIMATION Course Notes

ECE531 Lecture 4b: Composite Hypothesis Testing

Lecture Testing Hypotheses: The Neyman-Pearson Paradigm

Introduction to Machine Learning. Lecture 2

Mathematical Statistics

ECE531 Lecture 13: Sequential Detection of Discrete-Time Signals

STAT 830 Hypothesis Testing

Machine Learning Linear Classification. Prof. Matteo Matteucci

Sequential Procedure for Testing Hypothesis about Mean of Latent Gaussian Process

Topic 19 Extensions on the Likelihood Ratio

Introduction to Signal Detection and Classification. Phani Chavali

Hypothesis Testing. 1 Definitions of test statistics. CB: chapter 8; section 10.3

INFORMATION PROCESSING ABILITY OF BINARY DETECTORS AND BLOCK DECODERS. Michael A. Lexa and Don H. Johnson

Detection Theory. Chapter 3. Statistical Decision Theory I. Isael Diaz Oct 26th 2010

An Introduction to Signal Detection and Estimation - Second Edition Chapter III: Selected Solutions

ECE531 Screencast 11.4: Composite Neyman-Pearson Hypothesis Testing

Physics 403. Segev BenZvi. Classical Hypothesis Testing: The Likelihood Ratio Test. Department of Physics and Astronomy University of Rochester

Hypothesis testing: theory and methods

Bayesian decision theory Introduction to Pattern Recognition. Lectures 4 and 5: Bayesian decision theory

STAT 830 Hypothesis Testing

Parameter Estimation and Fitting to Data

Variations. ECE 6540, Lecture 10 Maximum Likelihood Estimation

Parametric Techniques Lecture 3

UNIFORMLY MOST POWERFUL CYCLIC PERMUTATION INVARIANT DETECTION FOR DISCRETE-TIME SIGNALS

DS-GA 1002 Lecture notes 10 November 23, Linear models

Machine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li.

LECTURE 10: NEYMAN-PEARSON LEMMA AND ASYMPTOTIC TESTING. The last equality is provided so this can look like a more familiar parametric test.

STAT 135 Lab 5 Bootstrapping and Hypothesis Testing

Hypothesis testing (cont d)

Detection and Estimation Theory

Hypothesis Testing - Frequentist

Statistical Methods for Particle Physics (I)

University of Cambridge Engineering Part IIB Module 3F3: Signal and Pattern Processing Handout 2:. The Multivariate Gaussian & Decision Boundaries

Detection Theory. Composite tests

8: Hypothesis Testing

STAT 135 Lab 6 Duality of Hypothesis Testing and Confidence Intervals, GLRT, Pearson χ 2 Tests and Q-Q plots. March 8, 2015

LECTURE NOTE #3 PROF. ALAN YUILLE

Ch. 5 Hypothesis Testing

Statistics for the LHC Lecture 1: Introduction

Direction: This test is worth 250 points and each problem worth points. DO ANY SIX

Lecture 8: Information Theory and Statistics

Statistics, Data Analysis, and Simulation SS 2013

Bayes Decision Theory

Decision Criteria 23

40.530: Statistics. Professor Chen Zehua. Singapore University of Design and Technology

Parametric Techniques

ON UNIFORMLY MOST POWERFUL DECENTRALIZED DETECTION

Topic 17: Simple Hypotheses

Optimum Joint Detection and Estimation

APPENDIX 1 NEYMAN PEARSON CRITERIA

Performance Evaluation and Comparison

Bayesian Learning (II)

EECS564 Estimation, Filtering, and Detection Exam 2 Week of April 20, 2015

44 CHAPTER 2. BAYESIAN DECISION THEORY

TARGET DETECTION WITH FUNCTION OF COVARIANCE MATRICES UNDER CLUTTER ENVIRONMENT

Brandon C. Kelly (Harvard Smithsonian Center for Astrophysics)

14.30 Introduction to Statistical Methods in Economics Spring 2009

Stephen Scott.

Partitioning the Parameter Space. Topic 18 Composite Hypotheses


STAT 801: Mathematical Statistics. Hypothesis Testing

Lecture 8: Information Theory and Statistics

Novel spectrum sensing schemes for Cognitive Radio Networks

REVIEW OF ORIGINAL SEMBLANCE CRITERION SUMMARY

Summary of Chapters 7-9

Probability and Statistics Notes

Digital Transmission Methods S

simple if it completely specifies the density of x

Lecture 21: October 19

CSE 312 Final Review: Section AA

Detection and Estimation Chapter 1. Hypothesis Testing

Transcription:

ECE 830 Spring 207 Instructor: R. Willett Lecture 5: Likelihood ratio tests, Neyman-Pearson detectors, ROC curves, and sufficient statistics Executive summary In the last lecture we saw that the likelihood ratio statistic was optimal for testing between two simple hypotheses. The test simply compares the likelihood ratio to a threshold. The optimal threshold is a function of the prior probabilities and the costs assigned to different errors. The choice of costs is subjective and depends on the nature of the problem, but the prior probabilities must be known. In practice, we face several questions:. Unfortunately, often the prior probabilities are not known precisely, and thus the correct setting for the threshold is unclear. How should we proceed? 2. What are the tradeoffs among different measures of error (e.g. probability of false alarm, probability of miss, etc.)? 3. Is the LRT still optimal for different error criteria? 4. Do we really need to store all the observed data, or can we get by with some summary statistics? We will address these questions in these notes. To explore these questions, we will look at a simple example. Example: Detecting ET We observe a sampled radio signal from outer space. Is this just cosmic radiation / noise, or are we receiving a message from extra-terrestrial intelligence (ETI)? null hypothesis (no ETI) : x i iid N (0, σ 2 ), i =,..., n () alternative hypothesis (ETI) H : x i iid N (µ, σ 2 ), µ > 0, i =,..., n. (2) Assume that σ 2 > 0 is known. The first hypothesis is simple. It involves a fixed and known distribution. The second hypothesis is simple if µ is known. However, if all we know is that µ > 0, then the second hypothesis is the composite of many alternative distributions, i.e., the collection {N(µ, σ 2 )} µ>0. In this case, H is called a composite hypothesis. The likelihood ratio test takes the form n i= n i= 2πσ 2 e 2σ 2 (x i µ) 2 2πσ 2 e 2σ 2 x2 i = e n (2πσ 2 ) n/2 2σ 2 i= (x i µ) 2 e n (2πσ 2 ) n/2 2σ 2 i= x2 i H γ

Lecture 5: Likelihood ratio tests, Neyman-Pearson detectors, ROC curves, and sufficient statistics2 The inequalities are preserved if we apply a monotonic transformation to both sides, so we can simplify the expression by taking the logarithm, giving us the log-likelihood ratio test ( ) n 2µ x 2σ 2 i + nµ 2 log(γ) i= Assuming µ > 0, this is equivalent to with ν = σ2 µ ln γ + nµ 2. 2 Sufficient Statistics n i= x i H ν, Our test statistic t := n i= x i is called the sufficient statistic for the mean of a normal distribution. Let s rewrite our hypotheses in terms of the sufficient statistic: H :t N (0, nσ 2 ), (3) H :t N (nµ, nσ 2 ), µ > 0 (4) We call t a sufficient statistic because t is sufficient for performing our likelihood ratio test. More formally, a sufficient statistic is defined as follows: Definition: Sufficient statistic Let X be an n-dimensional random vector and let θ denote a p-dimensional parameter of the distribution of X. The statistic t := T (x) is a sufficient statistic for θ if and only if the conditional distribution of X given T (X) is independent of θ. More details are available at http://willett.ece.wisc.edu/wp-uploads/206/02/04- SuffStats.pdf. If our data is drawn from an exponential family probability model parameterized by θ with the form p θ (x) = exp[a(θ)b(x) + c(x) + d(θ)] then t(x) = n i= b(x i) is a sufficient statistic. Thus if we are faced with a hypothesis test of the form :x i iid p θ0 (x) (5) H :x i iid p θ (x) (6) then our test statistic is a simple function of the data, and we do not need to know the functions a, c, and d to compute it (though we may need these functions to choose an appropriate threshold).

Lecture 5: Likelihood ratio tests, Neyman-Pearson detectors, ROC curves, and sufficient statistics3 3 Choosing thresholds Recall that our ideal threshold is ν = σ2 nµ ln γ +. However, in our ETI setting we have µ 2 no way of knowing prior probabilities π 0, π to set γ, AND we have no way of knowing a good value for µ. To deal with this, consider an alternative design specification. Let s design a test that minimizes one type of error subject to a constraint on the other type of error. This constrained optimization criterion does not require knowledge of prior probabilities nor cost assignments. It only requires a specification of the maximum allowable value for one type of error, which is sometimes even more natural than assigning costs to the different errors. A classic result due to Neyman and Pearson shows that the solution to this type of optimization is again a likelihood ratio test. 4 Neyman-Pearson Lemma Assume that we observe a random variable distributed according to one of two distributions. : X p 0 H : X p In many problems, is consider to be a sort of baseline or default model and is called the null hypothesis. H is a different model and is called the alternative hypothesis. If a test chooses H when in fact the data were generated by the error is called a false-positive or false-alarm, since we mistakenly accepted the alternative hypothesis. The error of deciding when H was the correct model is called a false-negative or miss. Let T denote a testing procedure based on an observation of X, and let R T denote the subset of the range of X where the test chooses H. The probability of a false-positive is denoted by P 0 (R T ) := p 0 (x) dx. R T The probability of a false-negative is P (R T ), where P (R T ) := p (x) dx, R T is the probability of correctly deciding H, often called the probability of detection. Consider likelihood ratio tests of the form p (x) p o (x) H λ. The subset of the range of X where this test decides H is denoted R LR (λ) := {x : p (x) > λ p 0 (x)}, and therefore the probability of a false-positive decision is P 0 (R LR (λ)) := p 0 (x) dx = R LR (λ) {x:p (x)>λp 0 (x)} p 0 (x) dx

Lecture 5: Likelihood ratio tests, Neyman-Pearson detectors, ROC curves, and sufficient statistics4 This probability is a function of the threshold λ; the set R LR (λ) shrinks/grows as λ increases/decreases. We can select λ to achieve a desired probability of error. Neyman-Pearson Lemma Consider the likelihood ratio test p (x) p o (x) H λ with λ > 0 chosen so that P 0 (R LR (λ)) = α. There does not exist another test T with P 0 (R T ) α and P (R T ) > P (R LR (λ)). That is, the LRT is the most powerful test with probability of false-negative less than or equal to α. Proof. Let T be any test with P 0 (R T ) = α and let NP denote the LRT with λ chosen so that P 0 (R LR (λ)) = α. To simplify the notation we will denote use R NP to denote the region R LR (λ). For any subset R of the range of X define P i (R) := p i (x) dx, This is simply the probability of X R under hypothesis H i. Note that P i (R NP ) = P i (R NP R T ) + P i (R NP R c T ) R P i (R T ) = P i (R NP R T ) + P i (R c NP R T ) where the superscript c indicates the complement of the set. By assumption P 0 (R NP ) = P 0 (R T ) = α, therefore P 0 (R NP R c T ) = P 0 (R c NP R T ). Now, we want to show which holds if To see that this is indeed the case, P (R NP ) P (R T ) P (R NP R c T ) P (R c NP R T ). P (R NP R c T ) = R NP R c T λ R NP R c T p (x) dx p o (x) dx = λ P o (R NP R c T ) = λ P o (RNP c R T ) = λ p o (x) dx R c NP R T R c NP R T p (x) dx = P (R c NP R T ).

Lecture 5: Likelihood ratio tests, Neyman-Pearson detectors, ROC curves, and sufficient statistics5 The probability of a false-positive is also called the probability of false-alarm, which we will denote by P F A in the following examples. We will also denote the probability of detection ( probability of a false-negative) by P D. The NP test maximizes P D subject to a constraint on P F A. Example: Detecting a DC Signal in Additive White Gaussian Noise Return to our ETI example from earlier. Assuming µ > 0, our LRT amounts to n i= x i H ν, with ν = σ2 nµ ln γ +, and since γ was ours to choose, we can equivalently choose ν µ 2 to trade-off between the two types of error. Let s now determine P F A and P D for the log-likelihood ratio test. ( ) P F A = t2 ν 2nπσ 2 e 2nσ 2 dt = Q, nσ 2 2π e u2 /2 du, the tail probability of the standard normal distri- where Q(z) = u z bution. Similarly, ν P D = ν ( ) (t nµ)2 ν nµ e 2nσ 2 dt = Q 2nπσ 2 nσ 2 In both cases the expression in terms of the Q function is the result of a simple change of variables in the integration. The Q function is invertible, so we can solve for the value of ν in terms of P F A, that is ν = nσ 2 Q (P F A ). Using this we can express P D as ( ) nµ P D = Q Q 2 (P F A ), nµ 2 σ 2 where is simply the square root of the signal-to-noise ratio ( SNR). Since Q(z) as z, it is easy to see that the probability of detection increases as µ and/or n increase. σ 2. 5 Receiver Operating Characteristic (ROC) curves The binary hypothesis test : X = W H : X = S + W

Lecture 5: Likelihood ratio tests, Neyman-Pearson detectors, ROC curves, and sufficient statistics6 where W N(0, σ 2 I n n ) and S = [s, s 2,..., s n ] T is the known signal. ( P 0 (X) = exp ) (2πσ 2 ) n 2 2σ 2 XT X [ P (X) = exp ] (2πσ 2 ) n 2 2σ (X 2 S)T (X S) Apply the likelihood ratio test (LRT): After simplification, we have log Λ(x) = log P (X) P 0 (X) = 2σ 2 [ 2XT S + S T S] H X T S H σ 2 γ + ST S 2 = γ γ The test statistic X T S is usually called a matched filter. The LR detector filters data by projecting them onto the signal subspace. 5. Quality of the classifier Question: How can we assess the quality of a detector? Example: : X N(0, ) H : X N(2, ) 0.5 0.45 p 0(x) p (x) 0.4 0.35 0.3 0.25 0.2 0.5 0. 0.05 0 3 2 0 2 3 4 5 x P F A and P D characterize the performance of the detector. As γ increases, P F A decreases (good) and P D decreases (bad).

Lecture 5: Likelihood ratio tests, Neyman-Pearson detectors, ROC curves, and sufficient statistics7 Definition: Receiver Operating Characteristic (ROC) Curve An ROC curve is a plot that illustrates the performance of a detector (binary classifier) by plotting its P D vs. P F A at various threshold settings. First use: In World War II. The ROC curve was first developed by electrical engineers and radar engineers during for detecting aircrafts from radar signals after the attack on the Pearl Harbor. To compute the ROC curve, vary the threshold level γ and compute P F A and P D. 0.5 0.4 p 0(x) p (x) 0.9 0.8 0.7 0.6 0.3 0.2 0. 0 4 2 0 2 4 6 x The ROC curve PD 0.5 0.4 0.3 0.2 0. 0 0 0.2 0.4 0.6 0.8 P F A Starts from (0, 0) and ends at (, ) (unless p i (± ) > 0). The diagonal line from (0, 0) to (, ) corresponds to random guesses. Depends on signal strength, noise strength, noise type, etc. 5.2 The AWGN Assumption AWGN is gaussian distributed as W N(0, σ 2 I) Is real-world noise really additive, white and Gaussian? Noise in many applications (e.g. communication and radar) arise from several independent sources, all adding together at sensors and combining additively to the measurement. Central Limit Theorem If x,..., x n are independent random variables with means µ i and variances σi 2 <,then Z n = n x i µ i n i= σ i N(0, ) in distribution as n. Thus, it is quite reasonable to model noise as additive and Gaussian in many applications. However, whiteness is not always a good assumption. 5.3 Colored Gaussian Noise

Lecture 5: Likelihood ratio tests, Neyman-Pearson detectors, ROC curves, and sufficient statistics8 Example: Correlated noise W = S + S 2 + + S k, where S, S 2,... S k are interferring signals that are not of interest, and each of them is structured/correlated in time. W N(0, Σ) is called correlated or colored noise, where Σ is a structured covariance matrix. Consider the binary hypothesis test in this case. : X = S 0 + W H : X = S + W where W N(0, Σ) and S 0 and S are known signals. So we can rewrite the hypothesis as : X N(S 0, Σ) H : X N(S, Σ) The probability density of each hypothesis is [ P i (X) = exp ] (2π) 2 n (Σ) 2 2 (X S i) T Σ (X S i ), i = 0, The log likelihood ratio is ( ) P (X) log = [ (X S ) T Σ(X S ) (X S 0 ) T Σ (X S 0 ) ] P 2 (X) 2 Equivalently, = X T Σ (S S 0 ) 2 ST Σ S + 2 ST 0 Σ S 0 H γ (S S 0 ) T Σ X H γ + ST Σ S ST 0 Σ S 0 2 2 Let t(x) = (S S 0 ) T Σ X, we can get = γ : t N((S S 0 ) T Σ S 0, (S S 0 ) T Σ (S S 0 )) H : t N((S S 0 ) T Σ S, (S S 0 ) T Σ (S S 0 )) The probability of false alarm is ( P F A = Q In this case it is natural to define γ (S S 0 ) T Σ S 0 [(S S 0 ) T Σ (S S 0 )] 2 SNR = (S S 0 ) T Σ (S S 0 ) )