ECE531 Lecture 13: Sequential Detection of Discrete-Time Signals

Similar documents
ECE531 Lecture 4b: Composite Hypothesis Testing

ECE531 Lecture 6: Detection of Discrete-Time Signals with Random Parameters

ECE531 Lecture 2a: A Mathematical Model for Hypothesis Testing (Finite Number of Possible Observations)

ECE531 Screencast 9.2: N-P Detection with an Infinite Number of Possible Observations

ECE531 Lecture 2b: Bayesian Hypothesis Testing

ECE531 Screencast 11.4: Composite Neyman-Pearson Hypothesis Testing

ECE531 Screencast 11.5: Uniformly Most Powerful Decision Rules

Introduction to Bayesian Statistics

Chapter 2. Binary and M-ary Hypothesis Testing 2.1 Introduction (Levy 2.1)

ECE531 Homework Assignment Number 2

Lecture 5: Likelihood ratio tests, Neyman-Pearson detectors, ROC curves, and sufficient statistics. 1 Executive summary

2. What are the tradeoffs among different measures of error (e.g. probability of false alarm, probability of miss, etc.)?

ECE531 Lecture 8: Non-Random Parameter Estimation

COMP 551 Applied Machine Learning Lecture 19: Bayesian Inference

Sequential Decisions

Detection theory 101 ELEC-E5410 Signal Processing for Communications

Decentralized Sequential Hypothesis Testing. Change Detection

DETECTION theory deals primarily with techniques for

Lecture 2: From Linear Regression to Kalman Filter and Beyond

ECE531: Principles of Detection and Estimation Course Introduction

Sequential Decisions

Lecture Testing Hypotheses: The Neyman-Pearson Paradigm

BAYESIAN DECISION THEORY

Sequential Procedure for Testing Hypothesis about Mean of Latent Gaussian Process

Detection theory. H 0 : x[n] = w[n]

Statistical Methods for Particle Physics Lecture 2: statistical tests, multivariate methods

hypothesis testing 1

ECE531 Lecture 10b: Maximum Likelihood Estimation

ORF 245 Fundamentals of Statistics Chapter 9 Hypothesis Testing

INTRODUCTION TO BAYESIAN INFERENCE PART 2 CHRIS BISHOP

Testing Simple Hypotheses R.L. Wolpert Institute of Statistics and Decision Sciences Duke University, Box Durham, NC 27708, USA

Bayesian Updating with Discrete Priors Class 11, Jeremy Orloff and Jonathan Bloom

Cost-Aware Bayesian Sequential Decision-Making for Domain Search and Object Classification

Statistics for the LHC Lecture 1: Introduction

Introduction to Statistical Inference

Detection and Estimation Chapter 1. Hypothesis Testing

Bayesian Data Analysis

Preliminary Statistics Lecture 2: Probability Theory (Outline) prelimsoas.webs.com

Ways to make neural networks generalize better

Detection and Estimation Theory

JOINT DETECTION-STATE ESTIMATION AND SECURE SIGNAL PROCESSING

Some slides from Carlos Guestrin, Luke Zettlemoyer & K Gajos 2

How Much Evidence Should One Collect?

Physics 403. Segev BenZvi. Classical Hypothesis Testing: The Likelihood Ratio Test. Department of Physics and Astronomy University of Rochester

Multivariate statistical methods and data mining in particle physics

Bayesian Methods: Naïve Bayes

Statistical Inference

Sequential Detection. Changes: an overview. George V. Moustakides

STAT 830 Hypothesis Testing

7. Statistical estimation

If there exists a threshold k 0 such that. then we can take k = k 0 γ =0 and achieve a test of size α. c 2004 by Mark R. Bell,

Bayesian Learning (II)

Unobservable Parameter. Observed Random Sample. Calculate Posterior. Choosing Prior. Conjugate prior. population proportion, p prior:

Appendix A : Introduction to Probability and stochastic processes

Detection Theory. Chapter 3. Statistical Decision Theory I. Isael Diaz Oct 26th 2010

A.I. in health informatics lecture 2 clinical reasoning & probabilistic inference, I. kevin small & byron wallace

Machine Learning CMPT 726 Simon Fraser University. Binomial Parameter Estimation

Change Detection Algorithms

Lecture 22: Error exponents in hypothesis testing, GLRT

Statistical Methods for Particle Physics (I)

STOCHASTIC PROCESSES, DETECTION AND ESTIMATION Course Notes

On the Optimality of Likelihood Ratio Test for Prospect Theory Based Binary Hypothesis Testing

Notes on Decision Theory and Prediction

ECE531 Lecture 10b: Dynamic Parameter Estimation: System Model

Fundamentals of Statistical Signal Processing Volume II Detection Theory

Machine Learning: Homework Assignment 2 Solutions

7. Statistical estimation

CS 630 Basic Probability and Information Theory. Tim Campbell

LECTURE 10: NEYMAN-PEARSON LEMMA AND ASYMPTOTIC TESTING. The last equality is provided so this can look like a more familiar parametric test.

Frequentist Statistics and Hypothesis Testing Spring

Hypothesis Testing. Testing Hypotheses MIT Dr. Kempthorne. Spring MIT Testing Hypotheses

Lecture 2. G. Cowan Lectures on Statistical Data Analysis Lecture 2 page 1

Chapter Three. Hypothesis Testing

Bayesian RL Seminar. Chris Mansley September 9, 2008

IN HYPOTHESIS testing problems, a decision-maker aims

Point Estimation. Vibhav Gogate The University of Texas at Dallas

ECE531: Principles of Detection and Estimation Course Introduction

On Sequential Watermark Detection

Lecture 3. STAT161/261 Introduction to Pattern Recognition and Machine Learning Spring 2018 Prof. Allie Fletcher

Hypothesis testing (cont d)

Bayesian Decision Theory

Some Probability and Statistics

Lecture 7. Union bound for reducing M-ary to binary hypothesis testing

10. Composite Hypothesis Testing. ECE 830, Spring 2014

Stat 535 C - Statistical Computing & Monte Carlo Methods. Arnaud Doucet.

Probability Density Functions and the Normal Distribution. Quantitative Understanding in Biology, 1.2

SEQUENTIAL TESTING OF SIMPLE HYPOTHESES ABOUT COMPOUND POISSON PROCESSES. 1. Introduction (1.2)

Decentralized Detection In Wireless Sensor Networks

Linear Models for Regression CS534

Today. Statistical Learning. Coin Flip. Coin Flip. Experiment 1: Heads. Experiment 1: Heads. Which coin will I use? Which coin will I use?

Information Theory and Hypothesis Testing

Bayes Formula. MATH 107: Finite Mathematics University of Louisville. March 26, 2014

The Logit Model: Estimation, Testing and Interpretation

IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 60, NO. 9, SEPTEMBER

HST.582J / 6.555J / J Biomedical Signal and Image Processing Spring 2007

Lecture 8: Information Theory and Statistics

Linear Models for Regression CS534

Advanced statistical methods for data analysis Lecture 1

Bayesian Statistics. State University of New York at Buffalo. From the SelectedWorks of Joseph Lucke. Joseph F. Lucke

ECE353: Probability and Random Processes. Lecture 3 - Independence and Sequential Experiments

Transcription:

ECE531 Lecture 13: Sequential Detection of Discrete-Time Signals D. Richard Brown III Worcester Polytechnic Institute 30-Apr-2009 Worcester Polytechnic Institute D. Richard Brown III 30-Apr-2009 1 / 32

Introduction All of the detection problems we have considered in this class have involved a fixed sample size n with the decision rule only being applied after all n samples are available. s k a + Y k n-sample detector decisions W k We now consider the sequential detection problem where we collect samples one at a time until we have enough observations to generate a final/terminal decision. There are two approaches to this problem: Bayes: Assign a relative cost to taking an observation and try to minimize the overall average cost. N-P: Keep taking observations until a desired tradeoff can be obtained between P fp and P D. Good decision rules achieve the desired tradeoff with fewer observations. Worcester Polytechnic Institute D. Richard Brown III 30-Apr-2009 2 / 32

Mathematical Model for Sequential Detection We consider only simple binary hypotheses here: H 0 : X = x 0 and H 1 : X = x 1. We assume that the scalar observations are i.i.d. random variables with density p 0 (z) under H 0 and p 1 (z) under H 1. The set of possible observation sequences can be written as Y = {y : y = {y k } k=1,y k R} A sequential decision rule (SDR) is a pair of sequences φ = {φ k } k=0 (stopping rule) δ = {δ k } k=0 (terminal decision rule) where φ k : R k {0,1} and δ k : R k {0,1}. Note that the SDR begins at index k = 0 whereas the first sample occurs at index k = 1. Worcester Polytechnic Institute D. Richard Brown III 30-Apr-2009 3 / 32

Sequential Decision Rules SDR procedure: If φn (y 1,...,y n ) = 0, we take another sample. If φn (y 1,...,y n ) = 1, we stop sampling and make the terminal decision δ n (y 1,..., y n ). The stopping time is the random variable N(φ) = min{n : φ n (Y 1,...,Y n ) = 1} Under our assumption of simple binary hypotheses, the performance metrics of interest are the conditional risks R 0 (δ) and R 1 (δ) as well as the expected stopping times under each state E 0 [N(φ)] := E[N(φ) X = x 0 ] and E 1 [N(φ)] := E[N(φ) X = x 1 ]. We would like to minimize all of these quantities but each can be traded off against the other. Worcester Polytechnic Institute D. Richard Brown III 30-Apr-2009 4 / 32

Bayes Formulation for Sequential Detection We assume the UCA. If we assign a cost of c to each observation, the conditional risks can be written as and R 0 (φ,δ) = Prob(decide H 1 X = x 0 ) + ce 0 [N(φ)] = E 0 [δ(y 1,...,Y N )] + ce 0 [N(φ)] R 1 (φ,δ) = Prob(decide H 0 X = x 1 ) + ce 1 [N(φ)] = 1 E 1 [δ(y 1,...,Y N )] + ce 1 [N(φ)]. Given a prior π 0 = Prob(X = x 0 ), π 1 = Prob(X = x 1 ), we can combine the conditional risks into a single Bayes risk r(φ,δ,π 1 ) = (1 π 1 )R 0 (φ,δ) + π 1 R 1 (φ,δ). We seek a SDR (φ,δ) that minimizes r(φ,δ,π 1 ). Worcester Polytechnic Institute D. Richard Brown III 30-Apr-2009 5 / 32

Step 0: No Samples Taken Yet We have three choices: decide 0, decide 1, or take a sample. If φ 0 = 1, we are choosing to make a decision without any observations. Is this a good idea? Actually, it might be if the cost of an observation is very high, the prior π 1 is very close to zero or one, or the observations are very noisy. If we stop, the risks of each possible decision rule are r(φ 0 = 1, δ 0 = 0, π 1 ) = π 1 r(φ 0 = 1, δ 0 = 1, π 1 ) = 1 π 1 If we choose to take a sample, the minimum Bayes risk will then be V (π 1 ) = min (1 π 1 )R 0 (φ, δ) + π 1 R 1 (φ, δ) c φ : φ 0 = 0 δ V (π 1 ) is the minimum expected risk over all SDRs that take at least one sample. Worcester Polytechnic Institute D. Richard Brown III 30-Apr-2009 6 / 32

Graphical Intuituion: Minimum Risk Function 1 π 1 π 1 1 π 1 π 1 Bayes risk V (π 1 ) V (π 1 ) c π L π U c c prior π 1 prior π 1 Worcester Polytechnic Institute D. Richard Brown III 30-Apr-2009 7 / 32

Step 0: No Samples Taken Yet It should be clear that our decision to take a sample at step 0 is determined by whether the quantity V (π 1 ) is smaller than both π 1 and 1 π 1. Lemma When the costs C 00 = C 11 = 0, then V (0) = V (1) = c and V (x) is a continous concave function on x [0,1]. Let P = {x [0,1] : V (x) min{x,1 x}}. If P is empty, then we definitely should not take a sample. We should just decide 1 when π 1 0.5, otherwise we should decide 0. If P is not empty then P = [π L,π U ] [0,1] and the best action at step 0 is now clear: π 1 π U : stop and decide 1 (φ 0 = 1,δ 0 = 1) π 1 π L : stop and decide 0 (φ 0 = 1,δ 0 = 0) π L < π 1 < π U : take a sample (φ 0 = 0,δ 0 =?) Worcester Polytechnic Institute D. Richard Brown III 30-Apr-2009 8 / 32

Minimum Risk Function: Binary Signals in Gaussian Noise 0.5 0.45 0.4 0.35 no observations no observations min risk one observation min risk two observations min risk three observations min risk four observations V * 0.3 Bayes risk 0.25 0.2 0.15 0.1 0.05 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 prior π 1 Worcester Polytechnic Institute D. Richard Brown III 30-Apr-2009 9 / 32

Example: Binary Signaling The previous slide plots the risk for the HT problem: H 0 : Y k N( 1,σ 2 ) H 1 : Y k N(1,σ 2 ) where σ 2 = 0.6 is known and c = 0.05. According to the minimum risk curves on the previous slide, if our prior was π 1 = 0.5, we can minimize the risk by taking two samples. So the optimum SDR should take two samples, right? What if our first sample turned out to be y 1 = 10? Should we take another sample? The sample y 1 = 10 is very strong evidence in favor of H 1. Is another sample likely to lower our risk after observing y 1 = 10? A sequential decision rule does not decide in advance how many samples it should take. The decision to take another sample is made dynamically and depends on the outcomes of the previous samples. Worcester Polytechnic Institute D. Richard Brown III 30-Apr-2009 10 / 32

Step 1: One Sample Taken: What Should We Do? Given no samples, we had a prior probability π 1 = Prob(X = x 1 ). If we decided to take one sample and observed Y 1 = y 1, we can compute a posterior probability π 1 (y 1 ) := Prob(X = x 1 Y 1 = y 1 ) π 1 p 1 (y 1 ) = π 0 p 0 (y 1 ) + π 1 p 1 (y 1 ) L(y 1 ) = 1 π 1 π 1 + L(y 1 ) We have three choices at this point: decide 0, decide 1, or take another sample. Since we ve assumed i.i.d. samples and the cost of taking another sample is the same, the problem at step n > 0 is the same problem that we faced in step 0, the only difference being the updated prior. Worcester Polytechnic Institute D. Richard Brown III 30-Apr-2009 11 / 32

Example: Coin Flipping Suppose we have the following simple binary HT problem: H 0 : coin is fair H 1 : coin is unfair with Prob(H) = q > 0.5 Let π 1 = Prob(state is H 1 ) and assume the UCA. If we have take no samples, the risks will be r(δ 1,π) = π 1 (always decide fair) r(δ 2,π) = 1 π 1 (always decide unfair) If we take one sample, the risks will be r(δ 3, π) = 1 π1 2 r(δ 4, π) = 1 π1 2 + π 1(1 q) + c (y 1 =T, decide fair; y 1 =H, decide unfair) + π 1q + c (y 1 =T, decide unfair; y 1 =H, decide fair) To find V, we have to keep doing this for 2, 3, 4,... samples. Worcester Polytechnic Institute D. Richard Brown III 30-Apr-2009 12 / 32

Example: Coin Flipping: c = 0.05 and q = 0.8 1 0.9 0.8 rule 1 rule 2 rule 3 + c rule 4 + c 0.7 0.6 Bayes risk 0.5 0.4 0.3 0.2 0.1 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 prior π 1 Worcester Polytechnic Institute D. Richard Brown III 30-Apr-2009 13 / 32

Example: Coin Flipping: c = 0.05 and q = 0.8 1 0.9 0.8 0.7 0.6 Bayes risk 0.5 0.4 0.3 0.2 0.1 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 prior π 1 Worcester Polytechnic Institute D. Richard Brown III 30-Apr-2009 14 / 32

Example: Coin Flipping: c = 0.05 Suppose we have a prior π 1 = 0.5. What should we do? Take a sample since taking one sample and using rule 3 has a lower Bayes risk at this prior (including the cost of the sample). The coin is flipped and you observe a head. What should you do? π 1 (y 1 = H) = π 1 p 1 (y 1 = H) π 0 p 0 (y 1 = H) + π 1 p 1 (y 1 = H) = 0.5 0.8 0.5 0.5 + 0.5 0.8 0.6154 Now what should we do? It looks like we need to take another sample. The coin is flipped and you observe a tail. What should you do? π 1({y 1, y 2} = {H,T}) = π 1(y 1 = H)p 1(y 2 = T) π 0(y 1 = H)p 0(y 2 = T) + π 1(y 1 = H)p 1(y 2 = T) 0.3902 Now what should we do? It is close. Let s stop and decide H 0. Worcester Polytechnic Institute D. Richard Brown III 30-Apr-2009 15 / 32

Example: Binary Signals in Gaussian Noise 0.5 0.45 0.4 0.35 no observations no observations min risk one observation min risk two observations min risk three observations min risk four observations V * 1 0.9 0.8 0.7 Bayes risk 0.3 0.25 0.2 0.15 posterior probability π 1 (y) 0.6 0.5 0.4 0.3 0.1 0.2 0.05 0.1 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 prior π 1 0 0 1 2 3 4 5 6 7 number of samples taken a 1 = a 0 = 1; σ 2 = 0.6; c = 0.05 Worcester Polytechnic Institute D. Richard Brown III 30-Apr-2009 16 / 32

Bayesian Formulation: Remarks When c > 0, we stop taking samples when the prior has been updated to a point where the expected minimum risk of taking one or more samples exceeds the risk of just making a decision now. If each sample were free, we could keep taking samples and make the minimum risk V (π 1 ) go to zero over all priors, e.g. coin flipping: 1 0.9 0.8 0.7 0.6 Bayes risk 0.5 0.4 0.3 0.2 0.1 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 prior π 1 Worcester Polytechnic Institute D. Richard Brown III 30-Apr-2009 17 / 32

Neyman-Pearson Formulation Samples no longer have an explicit cost. We have already investigated the case when the number of samples is fixed. We can design decision rules to achieve a two-way tradeoff between P fp and P D. Now, we would like to design a sequential decision rule that achieves a desired three-way tradeoff between P fp, P D, and the expected number of samples. Worcester Polytechnic Institute D. Richard Brown III 30-Apr-2009 18 / 32

Sequential Probability Ratio Test Given an n-sample observation {y k } n k=1, and a prior π 1, the posterior can be written as where π 1 ({y k } n k=1 ) = π 1 p 1 ({y k } n k=1 ) (1 π 1 )p 0 ({y k } n k=1 ) + π 1p 1 ({y k } n k=1 ) L({y k } n k=1 = ) 1 π 1 π 1 + L({y k } n k=1 ) = λ n 1 π 1 + λ n λ n := L({y k } n k=1 ) = p 1({y k } n k=1 ) p 0 ({y k } n k=1 ) = n k=1 π 1 p 1 (y k ) n p 0 (y k ) = L(y k ) π 1 ({y k } n k=1 ) is a strictly monotone increasing function of λ n when 0 < π 1 < 1. Hence testing π 1 ({y k } n k=1 ) π U and π 1 ({y k } n k=1 ) π L is equivalent to testing λ n B and λ n A. Worcester Polytechnic Institute D. Richard Brown III 30-Apr-2009 19 / 32 k=1

Sequential Probability Ratio Test (SPRT) A Sequential Probability Ratio Test, denoted as SPRT(A,B), is a sequential decision rule of the form (1,1) λ n B (stop and decide 1) (φ n ({y k } n k=1 ),δ n({y k } n k=1 )) = (1,0) λ n A (stop and decide 0) (0,?) A < λ n < B (take a sample) How are A and B related to π L and π U? Set π 1 ({y k } n k=1 ) equal to π L or π U and solve for λ n... A = B = (1 π 1)π L π 1 (1 π L ) (1 π 1)π U π 1 (1 π U ) Worcester Polytechnic Institute D. Richard Brown III 30-Apr-2009 20 / 32

Wald-Wolfowitz Theorem Theorem Let (φ,δ) be an SPRT with parameters (A,B) and let (φ,δ ) be any sequential decision rule. If P fp (φ,δ ) P D (φ,δ ) P fp (φ,δ) and P D (φ,δ) then E j [N(φ )] E j [N(φ)], j = 0 and 1. The theorem says that no sequential decision rule can achieve a better false positive probability and a better detection probability than an SPRT without also requiring more samples, on average, than the SPRT. Worcester Polytechnic Institute D. Richard Brown III 30-Apr-2009 21 / 32

Wald-Wolfowitz Theorem: SPRT vs. n-sample Detector We can reformulate our standard n-sample detector as a sequential decision rule (φ,δ ) with fixed observation size n: φ n({y k } n k=1) = {0,..., 0, 1} where the 1 occurs in the nth position δ n({y k } n k=1) = {?,...,?, δ({y k } n k=1)} where the δ occurs in the nth position Suppose we design an n-sample N-P decision rule δ with P fp = α and P D = β. What is E j [N(φ )] for j = 0 and 1? Now consider any sequential decision rule of the SPRT form that achieves P fp α and P D β. Then the Wald-Wolfowitz theorem says that E j [N(φ)] n for j = 0 and 1. Hence, the expected number of samples required by an SPRT (under each hypothesis) is no larger than that of a fixed decision rule with the same performance. Worcester Polytechnic Institute D. Richard Brown III 30-Apr-2009 22 / 32

How to Select A and B to Achieve the Desired Tradeoff We define the sets S 1 n = {y = {y k } k=1 : j = 1,...,n 1,A < λ j < B and λ n B} S 0 n = {y = {y k } k=1 : j = 1,...,n 1,A < λ j < B and λ n A} Note that S i n is the set of all observation sequences on which SPRT(A,B) decides H i after exactly n samples. Worcester Polytechnic Institute D. Richard Brown III 30-Apr-2009 23 / 32

How to Select A and B to Achieve the Desired Tradeoff The probability of false positive of SPRT(A,B) is then P fp = = Prob({y k } n k=1 S1 n X = x 0) n=0 n=0 Sn 1 j=1 B 1 n=0 n p 0 (y j )dy Sn 1 j=1 n p 1 (y j )dy = B 1 Prob({y k } n k=1 S1 n X = x 1) n=0 = B 1 P D where the inequality results from the fact that λ n = Q n j=1 p 1(y j ) Q n j=1 p 0(y j ) B. Worcester Polytechnic Institute D. Richard Brown III 30-Apr-2009 24 / 32

How to Select A and B to Achieve the Desired Tradeoff The probability of missed detection (false negative) of SPRT(A,B) can also be written as P fn = Prob({y k } n k=1 S0 n X = x 1 ) = n=0 n=0 A = A Sn 0 j=1 n=0 n p 1 (y j )dy Sn 0 j=1 n p 0 (y j )dy Prob({y k } n k=1 S0 n X = x 0 ) n=0 = A(1 P fp ) where the inequality results from the fact that λ n = Q n j=1 p 1(y j ) Q n j=1 p 0(y j ) A. Worcester Polytechnic Institute D. Richard Brown III 30-Apr-2009 25 / 32

How to Select A and B to Achieve the Desired Tradeoff Putting it all together, we can write two linear inequality constraints BP fp + P fn 1 AP fp + P fn A or, in terms of P D = 1 P fn, BP fp P D 0 AP fp P D A 1 These inequalities constrain P fp and P D in terms of our SPRT design parameters A and B. Note that we certainly want B > A and typically A < 1 and B > 1. Inequalities are necessary here since λ n may not exactly hit the boundary (A or B) but may jump over it. Under the assumption that each sample provides only a small amount of information, the amount of overshooot, e.g. λ n = B + ǫ or λ n = A ǫ, will usually be small. Worcester Polytechnic Institute D. Richard Brown III 30-Apr-2009 26 / 32

Linear Inequality Constraints B A PD Pfp Worcester Polytechnic Institute D. Richard Brown III 30-Apr-2009 27 / 32

Solving for A and B in terms of P fp and P D Our inequalities on A and B in terms of P fp and P D : A B 1 P D 1 P fp P D P fp Under the small overshoot assumption, we can assume that these inequalities are approximate equalities. Hence, if we require a N-P decision rule with P fp = α and P D = β, we should choose the SPRT parameters A B 1 β 1 α β α Note that the lines y = (1 A) + Ax and y = Bx intersect at the point (α,β) when A and B are chosen according to (1) and (2). Worcester Polytechnic Institute D. Richard Brown III 30-Apr-2009 28 / 32 (1) (2)

Example: Sequential Detection of Unfair Coins Suppose we have the following simple binary HT problem: H 0 : coin is fair H 1 : coin is unfair with Prob(H) = q > 0.5 Suppose we require P fp = 0.1 and P D = 0.8. We can solve for the SPRT parameters A = 2 9 B = 8 Hence, our SPRT should take samples until λ n 8, in which case we decide H 1, or until λ n 2/9, in which case we decide H 0. Note that λ n = p 1 ({y j } n j=1 )/p 0({y j } n j=1 ) = qk (1 q) n k /0.5 n where k is the number of heads and n is the number of samples/flips. Worcester Polytechnic Institute D. Richard Brown III 30-Apr-2009 29 / 32

Example: Sequential Detection of Unfair Coins q = 0.8 14 12 10 likelihood ratio 8 6 4 2 0 0 5 10 15 20 25 30 35 40 45 50 number of samples taken Worcester Polytechnic Institute D. Richard Brown III 30-Apr-2009 30 / 32

Example: Sequential Detection of Unfair Coins q = 0.6 10 9 8 7 likelihood ratio 6 5 4 3 2 1 0 0 50 100 150 200 250 300 number of samples taken Worcester Polytechnic Institute D. Richard Brown III 30-Apr-2009 31 / 32

Final Remarks Two approaches to sequential detection: Bayes: Assign a cost of c to each sample and a prior π to the states. Take samples until enough evidence is obtained such that the expected cost of additional samples is larger than the expected risk reduction. N-P: Three-way tradeoff between PD, P fp, and the number of samples. Please read example III.D.2 for a comparison of SPRT and fixed-sample-size (FSS) detectors. Comments on page 110 of your textbook are also enlightening. Even though the average number of samples for an SPRT is usually much less than a FSS detector, the actual number of samples for a particular sample sequence is unbounded! Sequential decision rules have limited application when the samples are not i.i.d. Worcester Polytechnic Institute D. Richard Brown III 30-Apr-2009 32 / 32