Detection theory. H 0 : x[n] = w[n]

Similar documents
PATTERN RECOGNITION AND MACHINE LEARNING

Detection theory 101 ELEC-E5410 Signal Processing for Communications

Lecture 5: Likelihood ratio tests, Neyman-Pearson detectors, ROC curves, and sufficient statistics. 1 Executive summary

DETECTION theory deals primarily with techniques for

ORF 245 Fundamentals of Statistics Chapter 9 Hypothesis Testing

Detection Theory. Chapter 3. Statistical Decision Theory I. Isael Diaz Oct 26th 2010

2. What are the tradeoffs among different measures of error (e.g. probability of false alarm, probability of miss, etc.)?

Introduction to Statistical Inference

Topic 3: Hypothesis Testing

EIE6207: Maximum-Likelihood and Bayesian Estimation

Rowan University Department of Electrical and Computer Engineering

EIE6207: Estimation Theory

F2E5216/TS1002 Adaptive Filtering and Change Detection. Course Organization. Lecture plan. The Books. Lecture 1

Lecture 8: Information Theory and Statistics

ELEG 5633 Detection and Estimation Signal Detection: Deterministic Signals

Introduction to Detection Theory

Linear models. x = Hθ + w, where w N(0, σ 2 I) and H R n p. The matrix H is called the observation matrix or design matrix 1.

Lecture 8: Signal Detection and Noise Assumption

If there exists a threshold k 0 such that. then we can take k = k 0 γ =0 and achieve a test of size α. c 2004 by Mark R. Bell,

Lecture 7 Introduction to Statistical Decision Theory

Detecting Parametric Signals in Noise Having Exactly Known Pdf/Pmf

10. Composite Hypothesis Testing. ECE 830, Spring 2014

Estimation and Detection

Topic 17: Simple Hypotheses

STOCHASTIC PROCESSES, DETECTION AND ESTIMATION Course Notes

Introduction to Bayesian Statistics

PATTERN RECOGNITION AND MACHINE LEARNING

Hypothesis Testing - Frequentist

Cherry Blossom run (1) The credit union Cherry Blossom Run is a 10 mile race that takes place every year in D.C. In 2009 there were participants

STAT 135 Lab 6 Duality of Hypothesis Testing and Confidence Intervals, GLRT, Pearson χ 2 Tests and Q-Q plots. March 8, 2015

Chapter 2. Binary and M-ary Hypothesis Testing 2.1 Introduction (Levy 2.1)

Review. December 4 th, Review

Bayesian Decision Theory

Composite Hypotheses and Generalized Likelihood Ratio Tests

Lecture 3. STAT161/261 Introduction to Pattern Recognition and Machine Learning Spring 2018 Prof. Allie Fletcher

Statistics: Learning models from data

Statistics for the LHC Lecture 1: Introduction

Detection Theory. Composite tests

Lecture 8: Information Theory and Statistics

Topic 15: Simple Hypotheses

Economics 520. Lecture Note 19: Hypothesis Testing via the Neyman-Pearson Lemma CB 8.1,

Chapter 2 Signal Processing at Receivers: Detection Theory

Introduction to Supervised Learning. Performance Evaluation

Introduction to Signal Detection and Classification. Phani Chavali

10-704: Information Processing and Learning Fall Lecture 24: Dec 7

exp{ (x i) 2 i=1 n i=1 (x i a) 2 (x i ) 2 = exp{ i=1 n i=1 n 2ax i a 2 i=1

ECE531 Lecture 6: Detection of Discrete-Time Signals with Random Parameters

Algorithmisches Lernen/Machine Learning

Signal Detection Basics - CFAR

Parameter estimation and forecasting. Cristiano Porciani AIfA, Uni-Bonn

Optimum Joint Detection and Estimation

Parameter Estimation, Sampling Distributions & Hypothesis Testing

Digital Transmission Methods S

APPENDIX 1 NEYMAN PEARSON CRITERIA

hypothesis testing 1

EECS564 Estimation, Filtering, and Detection Exam 2 Week of April 20, 2015

Direction: This test is worth 250 points and each problem worth points. DO ANY SIX

Frequentist Statistics and Hypothesis Testing Spring

LECTURE NOTE #3 PROF. ALAN YUILLE

Lecture Testing Hypotheses: The Neyman-Pearson Paradigm

Sequential Procedure for Testing Hypothesis about Mean of Latent Gaussian Process

Fundamental Probability and Statistics

6.4 Type I and Type II Errors

Announcements. Proposals graded

Information Theory and Hypothesis Testing

Detection and Estimation Theory

Statistical Data Analysis Stat 3: p-values, parameter estimation

Performance Evaluation and Comparison

Detection and Estimation Theory

Probability and Estimation. Alan Moses

Machine Learning Linear Classification. Prof. Matteo Matteucci

Compute f(x θ)f(θ) dθ

Introductory Econometrics. Review of statistics (Part II: Inference)

F79SM STATISTICAL METHODS

Fundamentals of Statistical Signal Processing Volume II Detection Theory

Advanced Signal Processing Introduction to Estimation Theory

ECE521 week 3: 23/26 January 2017

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review

Hypothesis Testing. 1 Definitions of test statistics. CB: chapter 8; section 10.3

Bayesian inference. Justin Chumbley ETH and UZH. (Thanks to Jean Denizeau for slides)

Partitioning the Parameter Space. Topic 18 Composite Hypotheses

SYDE 372 Introduction to Pattern Recognition. Probability Measures for Classification: Part I

Regularization. CSCE 970 Lecture 3: Regularization. Stephen Scott and Vinod Variyam. Introduction. Outline

CS 340 Fall 2007: Homework 3

TUTORIAL 8 SOLUTIONS #

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012

Intelligent Systems Statistical Machine Learning

Detection and Estimation Chapter 1. Hypothesis Testing

[y i α βx i ] 2 (2) Q = i=1

STAT 135 Lab 5 Bootstrapping and Hypothesis Testing

Econometrics I, Estimation

Two-stage Adaptive Randomization for Delayed Response in Clinical Trials

Problem Set 2. MAS 622J/1.126J: Pattern Recognition and Analysis. Due: 5:00 p.m. on September 30

Definition 3.1 A statistical hypothesis is a statement about the unknown values of the parameters of the population distribution.

Censoring for Type-Based Multiple Access Scheme in Wireless Sensor Networks

Hypothesis Testing. Part I. James J. Heckman University of Chicago. Econ 312 This draft, April 20, 2006

Chapter Three. Hypothesis Testing

Handout 12: Error Probability Analysis of Binary Detection

Statistical Inference. Hypothesis Testing

Evaluating Classifiers. Lecture 2 Instructor: Max Welling

Transcription:

Detection Theory

Detection theory A the last topic of the course, we will briefly consider detection theory. The methods are based on estimation theory and attempt to answer questions such as Is a signal of specific model present in our time series? E.g., detection of noisy sinusoid; beep or no beep? Is the transmitted pulse present at radar signal at time t? Does the mean level of a signal change at time t? After calculating the mean change in pixel values of subsequent frames in video, is there something moving in the scene?

Detection theory The area is closely related to hypothesis testing, which is widely used e.g., in medicine: Is the response in patients due to the new drug or due to random fluctuations? In our case, the hypotheses could be H 1 : x[n] = A cos(2πf 0 n + φ) + w[n] H 0 : x[n] = w[n] This example corresponds to detection of noisy sinusoid. The hypothesis H 1 corresponds to the case that the sinusoid is present and is called alternative hypothesis. The hypothesis H 0 corresponds to the case that the measurements consists of noise only and is called null hypothesis.

Introductory Example Neyman-Pearson approach is the classical way of solving detection problems in an optimal manner. It relies on so called Neyman-Pearson theorem. Before stating the theorem, consider a simplistic detection problem, where we observe one sample x[n] from one of two densities: N(0, 1) or N(1, 1). The task is to choose the correct density in an optimal manner.

Introductory Example 0.4 Likelihood 0.3 0.2 0.1 p(x; H0) p(x; H1) 0 4 3 2 1 0 1 2 3 4 x[0] Our hypotheses are now H 1 : µ = 1, H 0 : µ = 0. An obvious approach for deciding the density would choose the one, which is higher for a particular x[0].

Introductory Example More specifically, study the likelihoods and choose the more likely one. The likelihoods are H 1 : p(x[0] µ = 1) = 1 exp ( 2π H 0 : p(x[0] µ = 0) = 1 2π exp Now, one should select H 1 if p(x[0] µ = 1) > p(x[0] µ = 0). (x[n] 1)2 2 ) ( (x[n])2. 2 ).

Introductory Example Let s state this in terms of x[0]: p(x[0] µ = 1) > p(x[0] µ = 0) p(x[0] µ = 1) p(x[0] µ = 0) > 1 1 2π exp 1 2π exp ( (x[n] 1)2 2 ( (x[n])2 2 ) ) > 1 exp ( (x[n] 1)2 ) x[n] 2 ) > 1 2

Introductory Example (x[n] 2 (x[n] 1) 2 ) > 0 2x[n] 1 > 0 x[n] > 1 2. In other words, choose H 1 if x[0] > 0.5 and H 0 if x[0] < 0.5. Studying the ratio of likelihoods on the second row of the derivation is the key. This ratio is called likelihood ratio, and comparison to a threshold γ (here γ = 1) is called likelihood ratio test (LRT).

Introductory Example Note, that it is also possible to study posterior probability ratios p(h 1 x)/p(h 0 x) instead of the above likelihood ratio p(x H 1 )/p(x H 0 ). However, using Bayes rule, this MAP test turns out to be p(x H 1 ) p(x H 0 ) > p(h 1) p(h 2 ), i.e., the only effect of using posterior probability is on the threshold for the LRT.

Error Types It might be that the detection problem is not symmetric and some errors are more costly than others. For example, when detecting a disease, a missed detection is more costly than a false alarm. The tradeoff between misses and false alarms can be adjusted using the threshold of the LRT.

Error Types The below figure illustrates the probabilities of the two kinds of errors. The red area on the left corresponds to the probability of choosing H 1 while H 0 would hold (false match). The blue area is the probability of choosing H 0 while H 1 would hold (missed detection). Likelihood 0.4 0.3 0.2 0.1 0 4 3 2 1 0 1 2 3 4 x[0]

Error Types It can be seen that we can decrease either probability arbitrarily small by adjusting the detection threshold. 0.4 0.4 Likelihood 0.3 0.2 0.1 Likelihood 0.3 0.2 0.1 0 4 3 2 1 0 1 2 3 4 x[0] 0 4 3 2 1 0 1 2 3 4 x[0] Left: large threshold; small probability of false match (red), but a lot of misses (blue). Right: small threshold; only a few missed detections (blue), but a huge number of false matches (red).

Error Types Probability of false alarm for the threshold γ = 1.5 is P FA = P(x[0] > γ µ = 0) = 1.5 Probability of missed detection is P M = P(x[0] > γ µ = 1) = 1.5 ) 1 exp ( (x[n])2 dx[n] 0.0668. 2π 2 ) 1 (x[n] 1)2 exp ( dx[n] 0.6915. 2π 2 An equivalent, but more useful term is the complement of P M : probability of detection: P D = 1 P M = 1.5 ) 1 (x[n] 1)2 exp ( dx[n] 0.3085. 2π 2

Neyman-Pearson Theorem Since P FA and P D depend on each other, we would like to maximize P D subject to given maximum allowed P FA. Luckily the following theorem makes this easy. Neyman-Pearson Theorem: For a fixed P FA, the likelihood ratio test maximizes P D with the decision rule L(x) = p(x; H 1) p(x; H 0 ) > γ, with threshold γ is the value for which p(x; H 0 ) dx = P FA. x:l(x)>γ

Neyman-Pearson Theorem As an example, suppose we want to find the best detector for our introductory example, and we can tolerate 10% false alarms (P FA = 0.1). According to the theorem, the detection rule is: Select H 1 if p(x µ = 1) p(x µ = 0) > γ The only thing to find out now is the threshold γ such that γ p(x µ = 0) dx = 0.1. This can be done with Matlab function icdf, which solves the inverse cumulative distribution function.

Neyman-Pearson Theorem Unfortunately icdf solves the γ for which γ p(x µ = 0) dx = 0.1 instead of γ p(x µ = 0) dx = 0.1. Thus, we have to use the function like this: icdf( norm, 1-0.1, 0, 1), which gives γ 1.2816. Similarly, we can also calculate the P D with this threshold: P D = 1.2816 p(x µ = 1) dx 0.3891.

Detector for a known waveform The NP approach applies to all cases where likelihoods are available. An important special case is that of a known waveform s[n] embedded in WGN sequence w[n]: H 1 : x[n] = s[n] + w[n] H 0 : x[n] = w[n]. An example of a case where the waveform is known could be detection of radar signals, where a pulse s[n] transmitted by us is reflected back after some propagation time.

Detector for a known waveform For this case the likelihoods are p(x H 1 ) = p(x H 0 ) = N 1 n=0 N 1 n=0 ( ) 1 exp (x[n] s[n])2 2πσ 2 2σ 2, ( ) 1 exp (x[n])2 2πσ 2 2σ 2. The likelihood ratio test is easily obtained as [ ( p(x H 1 ) p(x H 0 ) = exp 1 N 1 2σ 2 (x[n] s[n]) 2 n=0 )] N 1 (x[n]) 2 > γ. n=0

Detector for a known waveform This simplifies by taking the logarithm from both sides: ( 1 N 1 2σ 2 (x[n] s[n]) 2 n=0 This further simplifies into ) N 1 (x[n]) 2 > ln γ. n=0 N 1 1 σ 2 x[n]s[n] 1 N 1 2σ 2 (s[n]) 2 > ln γ. n=0 n=0

Detector for a known waveform Since s[n] is a known waveform (= constant), we can simplify the procedure by moving it to the right hand side and combining it with the threshold: N 1 n=0 x[n]s[n] > σ 2 ln γ + 1 N 1 (s[n]) 2. 2 n=0 We can equivalently call the right hand side as our threshold (say γ ) to get the final decision rule N 1 n=0 x[n]s[n] > γ.

Examples This leads into some rather obvious results. The detector for a known DC level in WGN is N 1 n=0 N 1 x[n]a > γ A n=0 x[n] > γ Equally well we can set a new threshold and call it γ = γ/(an). This way the detection rule becomes: x > γ. Note that a negative A would invert the inequality.

Examples The detector for a sinusoid in WGN is N 1 n=0 N 1 x[n]a cos(2πf 0 n + φ) > γ A Again we can divide by A to get n=0 x[n] cos(2πf 0 n + φ) > γ. N 1 n=0 x[n] cos(2πf 0 n + φ) > γ. In other words, we check the correlation with the sinusoid. Note that the amplitude A does not affect our statistic, only the threshold which is anyway selected according to the fixed P FA rate.

Examples As an example, the below picture shows the detection process with σ = 0.5. 1 Noiseless signal 0 1 0 100 200 300 400 500 600 700 800 900 Noisy signal 5 0 5 0 100 200 300 400 500 600 700 800 900 Detection result 50 0 50 0 100 200 300 400 500 600 700 800 900 1000

Detection of random signals The problem with the previous approach was that the model was too restrictive; the results depend on how well the phases match. The model can be relaxed by considering random signals, whose exact form is unknown, but the correlation structure is known. Since the correlation captures the frequency (but not the phase), this is exactly what we want. In general, the detection of a random signal can be formulated as follows.

Detection of random signals Suppose s N(0, C s ) and w N(0, σ 2 I). Then the detection problem is a hypothesis test H 0 : x N(0, σ 2 I) H 1 : x N(0, C s + σ 2 I) It can be shown (see Kay-2, p. 145), that the decision rule becomes Decide H 1, if x T ŝ > γ, where ŝ = C s (C s + σ 2 I) 1 x.

Detection of random signals The term ŝ is in fact the estimate of the signal; more specifically, the linear Bayesian MMSE estimator, which assumes linearity for the estimator (similar to BLUE). A particular special case of a random signal is the Bayesian linear model. The Bayesian linear model assumes linearity x = Hθ + w together with a prior for the parameters, such as θ N(0, σ 2 I) Consider the following detection problem: H 0 : x = w H 1 : x = Hθ + w

Detection of random signals Within the earlier random signal framework, this is written as with C s = HC θ H T. H 0 : x N(0, σ 2 I) H 1 : x N(0, C s + σ 2 I) The assumption s N(0, HC θ H T ) states that the exact form of the signal is unknown, and we only know its covariance structure.

Detection of random signals This is helpful in the sinusoidal detection problem: we are not interested in the phase (included by the exact formulation x[n] = A cos(2πf 0 n + φ)), but only in the frequency (as described by the covariance matrix HC θ H T ). Thus, the decision rule becomes: where Decide H 1, if x T ŝ > γ, ŝ = C s (C s + σ 2 I) 1 x = HC θ H T (HC θ H T + σ 2 I) 1 x

Detection of random signals Luckily the decision rule simplifies quite a lot by noticing that the last part is the MMSE estimate of θ: x T ŝ = x T C θ H T (HC θ H T + σ 2 I) 1 x = x T H ˆθ. An example of applying the linear model is in Kay: Statistical Signal Processing, vol. 2; Detection Theory, pages 155-158. In the example, a Rayleigh fading sinusoid is studied, which has an unknown amplitude A and phase term φ. Only the frequency f 0 is assumed to be known.

Detection of random signals This can be manipulated into a linear model form with two unknowns corresponding to A and φ. The final result is the decision rule: N 1 x[n] exp( 2πif 0 n) > γ. n=0 As an example, the below picture shows the detection process with σ = 0.5. Note the simplicity of Matlab implementation: h = exp(-2*pi*sqrt(-1)*f0*n); y = abs(conv(h,x));

Detection of random signals 1 Noiseless signal 0 1 0 100 200 300 400 500 600 700 800 900 5 Noisy signal 0 5 0 100 200 300 400 500 600 700 800 900 50 Detection result 0 0 100 200 300 400 500 600 700 800 900

Receiver Operating Characteristics A usual way of illustrating the detector performance is the Receiver Operating Characteristics curve (ROC curve). This describes the relationship between P FA and P D for all possible values of the threshold γ. The functional relationship between P FA and P D depends on the problem (and the selected detector, although we have proven that LRT is optimal).

Receiver Operating Characteristics For example, in the DC level example, ) 1 (x 1)2 P D (γ) = exp ( dx γ 2π 2 ) 1 P FA (γ) = exp ( x2 dx 2π 2 It is easy to see the relationship: P D (γ) = γ 1 γ ) 1 exp ( x2 dx = P FA (γ 1). 2π 2

Receiver Operating Characteristics Plotting the ROC curve for all γ results in the following curve. 1 0.9 0.8 Probability of Detection, P D 0.7 0.6 0.5 0.4 0.3 Large threshold Small threshold 0.2 0.1 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Probability of False Alarm, P FA

Receiver Operating Characteristics The higher the ROC curve, the better the performance. A random guess has diagonal ROC curve. In the DC level case, the performance increases if the noise variance σ 2 decreases. Below are the ROC plots for various values of σ 2.

Receiver Operating Characteristics 1 Probability of Detection, P D 0.9 0.8 0.7 0.6 0.5 0.4 0.3 variance = 0.2 variance = 0.4 variance = 0.6 variance = 0.8 Random guess variance = 1 0.2 0.1 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Probability of False Alarm, P FA This gives rise to a widely used measure for detector performance: the Area Under (ROC) Curve, or AUC criterion.

Composite hypothesis testing In the previous examples the parameter values specified the distribution completely; e.g., either A = 1 or A = 0. Such cases are called simple hypotheses testing. Often we can t specify exactly the parameters for either case, but instead a range of values for each case. An example could be our DC model x[n] = A + w[n] with H 1 : A 0 H 0 : A = 0

Composite hypothesis testing The question can be posed in a probabilistic manner as follows: What is the probability of observing x[n] if H 0 would hold? If the probability is small (e.g., all x[n] [0.5, 1.5], and let s say the probability of observing x[n] under H 0 is 1 %), then we can conclude that the null hypothesis can be rejected with 99% confidence.

An example As an example, consider detecting a biased coin in a coin tossing experiment. If we get 19 heads out of 20 tosses, it seems rather likely that the coin is biased. How to pose the question mathematically? Now the hypotheses is H 1 : coin is biased: p 0.5 H 0 : coin is unbiased: p = 0.5, where p denotes the probability of a head for our coin.

An example Additionally, let s say, we want 99% confidence for the test. Thus, we can state the hypothesis test as: "what is the probability of observing at least 19 heads assuming p = 0.5?" This is given by the binomial distribution ( ) 20 0.5 19 0.5 1 19 }{{} + } {{ } or 19 heads 0.5 20 } {{ } 20 heads 0.00002. Since 0.00002 < 1%, we can reject the null hypothesis and the coin is biased.

An example Actually, the 99% confidence was a bit loose in this case. We could have set a 99.98% confidence requirement and still reject the null hypothesis. The upper limit for the confidence (here 99.98%) is widely used and called the p-value. More specifically, The p-value is the probability of obtaining a test statistic at least as extreme as the one that was actually observed, assuming that the null hypothesis is true.