Data-Efficient Quickest Change Detection Venu Veeravalli ECE Department & Coordinated Science Lab University of Illinois at Urbana-Champaign http://www.ifp.illinois.edu/~vvv (joint work with Taposh Banerjee) C 3 Seminar January 11, 2013
Quickest Change Detection (QCD) X 1,,X γ 1 X γ,x γ +1, i.i.d. f 1 i.i.d. f 0 Observation sequence Stopping time τ at which change is declared Tradeoff between o Detection delay o Frequency of false alarms Applications Veeravalli C3 1/11/13 2
Manufacturing Systems QCD can be used for quality control Veeravalli C3 1/11/13 3
Critical Infrastructure Monitoring QCD can be used to detect impending failure Veeravalli C3 1/11/13 4
Environmental Monitoring QCD can be used to detect abrupt increase in pollutants Veeravalli C3 1/11/13 5
Bio-Engineering QCD to detect onset of seizures in epilepsy characterized by EEG signature changes Veeravalli C3 1/11/13 6
Change in Distribution 4 f 0 =N(0,1) f 1 =N(0.1,1) 3 2 1 Samples 0 1 2 3 4 0 200 400 600 800 1000 1200 Time Veeravalli C3 1/11/13 7
Change in Distribution 0.9 0.8 0.7 0.6 Shiryaev s Algorithm 0.3 Change Point=500 0.2 0.1 0 0 200 400 500 600 800 1,000 1,200 Time Veeravalli C3 1/11/13 8
Bayesian Quickest Change Detection Proposed by Shiryaev (1963): X 1,,X Γ 1 i.i.d. f 0 Test is stopping time τ on observations ADD(τ ) = E Goal: ( ) + X Γ,X Γ+1, i.i.d. f 1 τ Γ PFA(τ ) = P {τ < Γ} min ADD(τ ) τ subj to PFA(τ ) α Γ Geom(ρ) P{Γ = m} = ρ(1 ρ) m 1,m 1 Veeravalli C3 1/11/13 9
Dynamic Programming Solution a posteriori probability of change p n = P (Γ n X 1,,X n ) is sufficient statistic p n has simple recursion p n+1 = p n L(X n+1 ) p n L(X n+1 ) + (1 p n ), p n = p n + (1 p n )ρ L(X n+1 ) = f 1 (X n+1 ) f 0 (X n+1 ) Veeravalli C3 1/11/13 10
1 Shiryaev Test Optimal policy (Shiryaev test) is threshold based where A is chosen so that PFA = { } τ S = min n 1: p n > A α ρ = 0.01 f 0 = N(0,1) f 1 = N(0.75,1) 0.9 0.8 0.7 0.6 0.5 p n 0.4 0.3 0.2 0.1 Threshold A =0.99 PFA path ADD path 0 0 50 100 150 Veeravalli C3 1/11/13 Γ 11
Performance of Shiryaev test Simple bound on PFA: PFA(τ S ) = P {τ S < Γ} = E [1 p τ S ] 1 A K-L Divergence: D(f 1 f 0 ) = f (x) log f (x) 1 1 f 0 (x) dx Asymptotic performance: as PFA(τ S ) 0 ADD (τ S ) ~ log PFA(τ s ) D(f 1 f 0 )+ log(1 ρ) [Tartakovsky&VVV 2005] Veeravalli C3 1/11/13 12
Tradeoff between ADD and PFA ρ = 0.01 f 0 = N(0,1) f 1 = N(0.75,1) Slope = [D(f f )+ log(1 ρ) ] 1 1 0 Veeravalli C3 1/11/13 13
Lorden s Minimax Formulation [1971] X 1,,X γ 1 X γ,x γ +1, i.i.d. f 1 i.i.d. f 0 Delay: WADD(τ ) = sup n 1 ess sup E n (τ n) + X 1,,X n 1 False Alarm: FAR(τ ) = 1 E [τ ] Goal: min WADD(τ ) τ subj to FAR(τ ) α Veeravalli C3 1/11/13 14
Pollak s Minimax Formulation [1985] X 1,,X γ 1 X γ,x γ +1, i.i.d. f 1 i.i.d. f 0 Delay: CADD(τ ) = sup n 1 E n (τ n) τ n False Alarm: FAR(τ ) = 1 E [τ ] Goal: min CADD(τ ) τ subj to FAR(τ ) α Veeravalli C3 1/11/13 15
At time n CuSum Test [Page 1954] o log-likelihood of no change: log[ f 0 o log-likelihood of change at time k: n (X i ) ] i =1 k 1 CuSum statistic (Maximum Likelihood): W n = max 1 k n+1 k 1 log[ f 0 (X i ) f 1 i =1 n n (X i )] log[ f 0 (X i ) ] i =k log[ f 0 (X i ) f 1 i =1 i =1 n (X i )] i =k CuSum Test: τ C = min {n 1:W n d} Recursion: W n+1 = ( W n + logl(x n )) +, W 0 = 0 Veeravalli C3 1/11/13 16
CuSuM Test Evolution 9 8 Threshold =8 7 6 5 false alarm path delay path W 4 3 2 1 0 0 10 20 30 40 50 60 70 80 γ Veeravalli C3 1/11/13 17
Optimality of CuSUM Test optimal under Lorden s minimax criterion [Moustakides 1986] asymptotically optimal under Pollak s criterion as FAR goes to 0 [Lai 1998] Veeravalli C3 1/11/13 18
Asymptotic Performance CuSum Test: W n+1 = max( 0, W n + logl(x n )), W 0 = 0 τ C = min {n 1:W n d} False Alarm Rate: d = logα FAR(τ C ) α Detection Delay: as FAR(τ C ) 0 CADD(τ C )~ logfar (τ ) C D(f 1 f 0 ) Veeravalli C3 1/11/13 19
CuSum Performance Tradeoff 26 f 0 =N(0,1) f 1 =N(0.75,1) 25 24 CUSUM CADD 23 22 21 20 19 18 17 Slope = 1/ D(f 1 f 0 ) 16 7 7.5 8 8.5 9 9.5 10 log(far) Veeravalli C3 1/11/13 20
Data-Efficient Quickest Change Detection
Data-Efficient QCD (DE-QCD) Cost associated with acquiring observations cost of data in process control energy in wireless sensor networks DE-QCD: three-fold decision-making stop and decide change continue and take next observation continue without taking next observation Veeravalli C3 1/11/13 22
Data-Efficient QCD (DE-QCD) S k = 1 if is used for decision making X k Information Vector: DE-QCD Policy: S k = µ k (I k 1 ) {0,1} k 0 (S I k = S 1,,S k,x ) 1 (S 1,,X k ) k Stopping time τ on { I } k φ = {τ,µ 1,,µ τ } At time k use I to decide to stop or continue k If decision is continue use X k+1 if S = µ (I ) = 1 k+1 k+1 k Veeravalli C3 1/11/13 23
Data-Efficient Bayesian Quickest Change Detection
Bayesian Data-Efficient QCD Extending Shiryaev s formulation to dataefficient setting: i.i.d. { X } Γ Geom(ρ) n Metric for data-efficiency: Average Number of Observations used before change Objective: ANO(φ)= E min φ min{τ,γ 1} S k k=1 ADD(φ) subj to PFA(φ) α and ANO(φ) β Veeravalli C3 1/11/13 25
DE-Shiryaev: Two-Threshold Algorithm 0 B A S k+1 = 1 if p k [B,A) S k+1 = 0 if p k < B Stop and declare change if p k A Veeravalli C3 1/11/13 26
DE-Shiryaev: Two-Threshold Algorithm If If S k+1 = 1 p k+1 = S k+1 = 0 update using Shiryaev recursion p k f 1 (X k+1 ) p k f 1 (X k+1 ) + (1 p k )f 0 (X k+1 ) p k = p k + (1 p k )ρ update using prior p k+1 = p k + (1 p k )ρ Stopping time τ DE-S = inf { k 1: p k > A} Veeravalli C3 1/11/13 27
Evolution of DE-Shiryaev algorithm Veeravalli C3 1/11/13 28
Asymptotic Optimality of DE-Shiryaev Justification: Dynamic Programming solution to min ADD(φ) + λ φ f PFA(φ)+λ e ANO(φ) Asymptotic optimality: For each fixed β, as α 0 PFA(φ DE-S ) PFA(τ S ) ADD(φ DE-S ) ADD(τ S ) Thresholds A and B can be set independent of each other for small PFA, with A controlling PFA and B controlling ANO Veeravalli C3 1/11/13 29
Tradeoff Curves 50 45 40 f 0 ~ N (0,1), f 1 ~ N (0.75,1), ρ = 0.1 ANO=75% or 50% of Shiryaev ANO Shiryaev ANO at 75% ANO at 50% ADD 35 30 25 20 4 5 6 7 8 9 10 11 Veeravalli C3 1/11/13 30
Comparison with Fractional Sampling Significant performance gain over fractional sampling 80 f 0 =N(0,1) f 1 =N(0.75,1) ρ=0.01 50% samples dropped 70 Fractional Sampling DE Shiryaev Shiryaev 60 ADD 50 40 30 20 4 5 6 7 8 9 10 11 log(pfa) Veeravalli C3 1/11/13 31
Data-Efficient Minimax Quickest Change Detection
Minimax Data-Efficient QCD No prior on change point Goal: Extend Pollak s and Lorden s formulation to data-efficient setting But o need new metric to capture cost of observations o Need substitute for ρ when we skip observations Approach o use insights from Bayesian setting o modify CuSum algorithm to look like DE-Shiryaev Veeravalli C3 1/11/13 33
Bayesian Data-Efficient QCD Information Vector: Policy for DE-QCD: (S I k = S 1,,S k,x ) 1 (S 1,,X k ) k φ = {τ,µ 1,,µ τ } S k = µ k (I k 1 ) Stopping time τ on { I } k Metric for data-efficiency: Objective: ANO(φ)= E min φ min{τ,γ 1} k=1 ADD(φ) S k subj to PFA(φ) α and ANO(φ) β Veeravalli C3 1/11/13 34
Data-Efficient Minimax QCD Information Vector: Policy for DE-QCD: (S I k = S 1,,S k,x ) 1 (S 1,,X k ) k φ = {τ,µ 1,,µ τ } S k = µ k (I k 1 ) Stopping time τ on { I } k Pre-change Duty Cycle (PDC) to replace ANO PDC(φ) = lim sup n 1 E n n n 1 k=1 S k τ n o lim sup cannot be replaced by sup as typically (an initial wait cannot be justified) o also change is generally rare S 1 = 1 Veeravalli C3 1/11/13 35
Data-Efficient Minimax QCD Data-efficient extension of Pollak : Data-efficient extension of Lorden: where min φ subj to FAR(φ) α min φ subj to FAR(φ) α WADD(φ) = sup n 1 CADD(φ) and PDC(φ) β WADD(φ) ess sup E n and PDC(φ) β (τ n) + I n 1 Veeravalli C3 1/11/13 36
DE-Shiryaev Algorithm S k+1 = 1 if p k [B,A) S k+1 = 0 if p k < B If S k+1 = 1 update using Shiryaev recursion p k+1 = p k f 1 (X k+1 ) p k f 1 (X k+1 ) + (1 p k )f 0 (X k+1 ) p k = p k + (1 p k )ρ If S k+1 = 0 update using prior p k+1 = p k + (1 p k )ρ Stopping time τ DE-S = inf { k 1: p k > A} Veeravalli C3 1/11/13 37
DE-Shiryaev Evolution Veeravalli C3 1/11/13 38
CuSuM Evolution W k+1 = ( W k + logl(x k+1 )) + W 0 = 0 τ C = inf { k 1:W k > d} 9 8 Threshold d=8 7 6 5 Wk 4 3 2 1 0 0 10 20 30 40 50 60 70 80 Veeravalli C3 1/11/13 39
DE-CUSUM Algorithm S k+1 = 1 if W k (0,d) S k+1 = 0 if W k < 0 If If S k+1 = 1 S k+1 = 0 update using CUSUM recursion W k+1 = W k + logl(x k+1 ) update using parameter µ W k+1 = (W k + µ) h+ Stopping time { } τ DE-C = inf k 1:W k > d Veeravalli C3 1/11/13 40
Evolution of DE-CUSUM algorithm 8 7 6 f 0 = N(0,1), f 1 = N(0.75,1), µ=0.1, h=1.1 d 4 W k 2 0 1.1 2 0 10 20 30 γ=40 50 τ DE C 70 h Veeravalli C3 1/11/13 41
Asymptotic Optimality of DE-CUSUM DE-CUSUM is asymptotically optimal For each fixed d, h, and µ τ DE-C τ C FAR(φ DE-C ) FAR(τ C ) For each fixed d, h, and µ CADD(φ DE-C ) CADD(τ C ) + K 1 For each fixed d, h <, and µ WADD(φ DE-C ) WADD(τ C ) + K 2 Veeravalli C3 1/11/13 42
Designing DE-CuSuM Parameters PDC controlled by µ,h PDC(φ DE-C ) FAR controlled primarily by d independent of d D = logα FAR(φ DE-C ) α µ µ + D(f 0 f 1 ) µ β 1 β D(f f ) PDC(φ ) β 0 1 DE-C Veeravalli C3 1/11/13 43
Trade-off Curves f 0 ~ N (0,1), f 1 ~ N (0.75,1) Veeravalli C3 1/11/13 44
Trade-off Curves Veeravalli C3 1/11/13 45
Data-Efficient Quickest Change Detection in Sensor Networks
Data-Efficient QCD in Sensor Networks Seek data-efficient schemes that reduce average observation cost per sensor Fusion center queries sensors for observations Special case of controlled sensing for inference Veeravalli C3 1/11/13 47
Data-Efficient QCD in Sensor Networks Bayesian (DP) analysis gives complex multithreshold structure [Premkumar and Kumar 08] DE-QCD for single sensor provides insights: Use likelihood ratio for stopping as well as observation control Propose algorithms that retain structure of classical QCD counterparts Have good performance and are asymptotically equivalent to their classical counterparts Can we obtain similar results for distributed QCD in sensor networks? Veeravalli C3 1/11/13 48
Data-Efficient QCD in Sensor Networks Problem formulation: recall PDC(φ) = lim sup n 1 E n n n 1 k=1 S k τ n Let PDC be PDC of sensor min CADD(φ) φ subj to FAR(φ) α and PDC (φ) β for = 1,,L Veeravalli C3 1/11/13 49
DE-QCD Algorithms for Sensor Networks Two data-efficient schemes for sensor networks o Centralized: Serialized DE-CUSUM o Distributed: DE-DIST Centralized: Serialized DE-CUSUM o Serialize observations and use DE-CUSUM o Equal PDC constraint per sensor is achieved o Asymptotically optimal: performance approaches that of centralized CUSUM scheme o Good trade-off curves Veeravalli C3 1/11/13 50
Evolution of Serialized DE-CUSUM 5 L = 10, f 0 ~ N (0,1), f 1 ~ N (0.75,1),d = 4.0 4 d 3 2 W k 1 0 1 1.5 2 1 2 3 4 γ h Veeravalli C3 1/11/13 51
Distributed: DE-DIST Algorithm Use DE-CUSUM at sensors to meet individual PDC constraint but only to decide whether or not to take observations If observation taken, transmit to fusion center Fusion center uses standard CUSUM to fuse information transmitted from sensors and makes the final decision to stop and decide change Veeravalli C3 1/11/13 52
Performance of Algorithms L = 10, f 0 ~ N (0,1), f 1 ~ N (0.2,1), PDC=0.5 80 70 60 Frac Sample DE DIST Serialized DE CuSum CuSum CADD 50 40 30 20 7.0 7.5 8 8.5 9 9.5 10 log(far) Veeravalli C3 1/11/13 53
Conclusions Introduced problem of data-efficient quickest change detection Useful in many applications Bayesian and minimax formulations Data-efficient procedures are simple extensions of classical counterparts First-order asymptotic optimality Substantial improvement over fractional sampling Extension to distributed sensor networks Work in progress: extensions to non-i.i.d. case, unknown/partially known distributions Veeravalli C3 1/11/13 54
References T. Banerjee and V. V. Veeravalli, Data-efficient quickest change detection with on-off observation control, Sequential Analysis, vol. 31, no. 1, pp. 40 77, 2012. T. Banerjee and V. V. Veeravalli, Data-efficient quickest change detection in minimax settings Submitted to IEEE Transactions on Information Theory, Nov. 2012. Available on ArXiv. T. Banerjee and V. V. Veeravalli, Data-efficient minimax quickest change detection, in IEEE Conference on Acoustics, Speech, and Signal Processing (ICASSP), pp. 3937 3940, Mar. 2012. T. Banerjee and V. V. Veeravalli, Energy-Efficient Quickest Change Detection in Sensor Networks, in IEEE Statistical Signal Processing (SSP) Workshop, Ann Arbor, Michigan, Aug. 2012. Veeravalli C3 1/11/13 55
Insights from Bayesian Setting DE-Shiryaev has an initial wait based on prior information This is followed by a sequence of two-sided tests intercepted by sleep times Each two-sided test is for sequential hypothesis testing between pre-change and post-change Samples are skipped if two-sided test stops with decision pre-change Change is declared first time a two-sided test stops with decision post-change Veeravalli C3 1/11/13 56
Insights from Bayesian Setting Two-sided test has upper threshold A and lower threshold B If the two-sided test stops below B then samples are skipped based on the likelihood ratio of all the observations taken (undershoot) n k 1 ρ ρ * (1 ) L ( Xi ) pn k= 1 i= k = 1 p P( Γ > n) n * L ( X ) = 1 if X is not used i The number of samples skipped is also a function of the geometric parameter ρ i n Veeravalli C3 1/11/13 57
Insights from Bayesian Setting p n When the statistic crosses from below p = + ρ n 1 pn (1 pn) ρ ρ For small the two-sided tests are roughly statistically independent B Constraint on the observation cost is met using an initial wait and the fraction of time samples are taken before change Veeravalli C3 1/11/13 58
Insights from DE-Shiryaev Evolution DE-Shiryaev has an initial wait based on prior information change is declared first time two-sided test stops with decision post-change samples skipped if two-sided test stops with decision pre-change sequence of two-sided tests intercepted by sleep times Veeravalli C3 1/11/13 59
DE-CUSUM Algorithm µ is substitute for ρ ρ is treated as a design parameter h controls the maximum gap between any two sampling times DE-CuSum can be seen as a sequence of SPRTs (two sided tests) intercepted with sleep times Sleep times are a function of h, µ, and the likelihood of observations used in the SPRTs A change is declared the first time an SPRT stops with a decision of post-change Veeravalli C3 1/11/13 60
Centralized: DE-CENT Algorithm Based on the interpretation of DE-CUSUM as a sequence of two-sided (SPRT type) tests Use SPRT (d,h) over space of sensors d is upper threshold and h is lower threshold If SPRT ( dh, ) stops below at use only m sensors in that slot Carry forward statistic if positive, else reset to zero L h m L Stop the first time SPRT ( dh, ) ends above d PDC constraint/sensor achieved by randomization Can show asymptotic optimality Veeravalli C3 1/11/13 61
Evolution of DE-CENT L = 10, f 0 ~ N (0,1), f 1 ~ N (0.75,1),d = 4.0, h = 2 Veeravalli C3 1/11/13 62