CHANGE DETECTION WITH UNKNOWN POST-CHANGE PARAMETER USING KIEFER-WOLFOWITZ METHOD

Similar documents
Data-Efficient Quickest Change Detection

COMPARISON OF STATISTICAL ALGORITHMS FOR POWER SYSTEM LINE OUTAGE DETECTION

Sequential Detection. Changes: an overview. George V. Moustakides

Uncertainty. Jayakrishnan Unnikrishnan. CSL June PhD Defense ECE Department

Space-Time CUSUM for Distributed Quickest Detection Using Randomly Spaced Sensors Along a Path

Statistical Models and Algorithms for Real-Time Anomaly Detection Using Multi-Modal Data

arxiv: v2 [eess.sp] 20 Nov 2017

Least Favorable Distributions for Robust Quickest Change Detection

Quickest Anomaly Detection: A Case of Active Hypothesis Testing

Large-Scale Multi-Stream Quickest Change Detection via Shrinkage Post-Change Estimation

Quickest Detection With Post-Change Distribution Uncertainty

Sequential Procedure for Testing Hypothesis about Mean of Latent Gaussian Process

Accuracy and Decision Time for Decentralized Implementations of the Sequential Probability Ratio Test

Asymptotically Optimal Quickest Change Detection in Distributed Sensor Systems

Decentralized Sequential Hypothesis Testing. Change Detection

THRESHOLD LEARNING FROM SAMPLES DRAWN FROM THE NULL HYPOTHESIS FOR THE GENERALIZED LIKELIHOOD RATIO CUSUM TEST

is a Borel subset of S Θ for each c R (Bertsekas and Shreve, 1978, Proposition 7.36) This always holds in practical applications.

Introduction to Bayesian Statistics

Stochastic Optimization with Inequality Constraints Using Simultaneous Perturbations and Penalty Functions

EARLY DETECTION OF A CHANGE IN POISSON RATE AFTER ACCOUNTING FOR POPULATION SIZE EFFECTS

Approximation of Average Run Length of Moving Sum Algorithms Using Multivariate Probabilities

Optimum CUSUM Tests for Detecting Changes in Continuous Time Processes

A CUSUM approach for online change-point detection on curve sequences

Analysis of DualCUSUM: a Distributed Energy Efficient Algorithm for Change Detection

Early Detection of a Change in Poisson Rate After Accounting For Population Size Effects

NONUNIFORM SAMPLING FOR DETECTION OF ABRUPT CHANGES*

Adaptive MMSE Equalizer with Optimum Tap-length and Decision Delay

Bayesian Quickest Change Detection Under Energy Constraints

X 1,n. X L, n S L S 1. Fusion Center. Final Decision. Information Bounds and Quickest Change Detection in Decentralized Decision Systems

Efficient scalable schemes for monitoring a large number of data streams

Performance of Certain Decentralized Distributed Change Detection Procedures

Data-Efficient Quickest Change Detection in Minimax Settings

Active Sequential Hypothesis Testing with Application to a Visual Search Problem

Chapter 2. Binary and M-ary Hypothesis Testing 2.1 Introduction (Levy 2.1)

LTI Systems, Additive Noise, and Order Estimation

STONY BROOK UNIVERSITY. CEAS Technical Report 829

Change Detection Algorithms

CONSTRAINED OPTIMIZATION OVER DISCRETE SETS VIA SPSA WITH APPLICATION TO NON-SEPARABLE RESOURCE ALLOCATION

The Shiryaev-Roberts Changepoint Detection Procedure in Retrospect - Theory and Practice

A Few Notes on Fisher Information (WIP)

Asymptotically Optimal and Bandwith-efficient Decentralized Detection

SCALABLE ROBUST MONITORING OF LARGE-SCALE DATA STREAMS. By Ruizhi Zhang and Yajun Mei Georgia Institute of Technology

QUANTIZATION FOR DISTRIBUTED ESTIMATION IN LARGE SCALE SENSOR NETWORKS

2262 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 47, NO. 8, AUGUST A General Class of Nonlinear Normalized Adaptive Filtering Algorithms

Quickest Line Outage Detection and Identification: Measurement Placement and System Partitioning

arxiv:math/ v2 [math.st] 15 May 2006

Asynchronous Multi-Sensor Change-Point Detection for Seismic Tremors

SEQUENTIAL CHANGE-POINT DETECTION WHEN THE PRE- AND POST-CHANGE PARAMETERS ARE UNKNOWN. Tze Leung Lai Haipeng Xing

Optimal Mean-Square Noise Benefits in Quantizer-Array Linear Estimation Ashok Patel and Bart Kosko

Sequential Change-Point Approach for Online Community Detection

A Novel Asynchronous Communication Paradigm: Detection, Isolation, and Coding

APOCS: A RAPIDLY CONVERGENT SOURCE LOCALIZATION ALGORITHM FOR SENSOR NETWORKS. Doron Blatt and Alfred O. Hero, III

Detection and Estimation of Multiple Fault Profiles Using Generalized Likelihood Ratio Tests: A Case Study

AN EMPIRICAL SENSITIVITY ANALYSIS OF THE KIEFER-WOLFOWITZ ALGORITHM AND ITS VARIANTS

A NEW INFORMATION THEORETIC APPROACH TO ORDER ESTIMATION PROBLEM. Massachusetts Institute of Technology, Cambridge, MA 02139, U.S.A.

Energy-Efficient Noncoherent Signal Detection for Networked Sensors Using Ordered Transmissions

Comparative Performance Analysis of Three Algorithms for Principal Component Analysis

Bayesian Regression Linear and Logistic Regression

Lecture 7 Introduction to Statistical Decision Theory

Ordinal optimization - Empirical large deviations rate estimators, and multi-armed bandit methods

Change-point models and performance measures for sequential change detection

An Effective Approach to Nonparametric Quickest Detection and Its Decentralized Realization

Quantization Effect on the Log-Likelihood Ratio and Its Application to Decentralized Sequential Detection

Gradient-based Adaptive Stochastic Search

The Behaviour of the Akaike Information Criterion when Applied to Non-nested Sequences of Models

Sparse Least Mean Square Algorithm for Estimation of Truncated Volterra Kernels

Bayesian Methods in Positioning Applications

Gaussian Estimation under Attack Uncertainty

10-704: Information Processing and Learning Fall Lecture 24: Dec 7

BAYESIAN ESTIMATION OF THE EXPONENTI- ATED GAMMA PARAMETER AND RELIABILITY FUNCTION UNDER ASYMMETRIC LOSS FUNC- TION

Inferring biological dynamics Iterated filtering (IF)

Censoring for Type-Based Multiple Access Scheme in Wireless Sensor Networks

Iterative Markov Chain Monte Carlo Computation of Reference Priors and Minimax Risk

Quickest Changepoint Detection: Optimality Properties of the Shiryaev Roberts-Type Procedures

Cooperative Communication with Feedback via Stochastic Approximation

g(.) 1/ N 1/ N Decision Decision Device u u u u CP

certain class of distributions, any SFQ can be expressed as a set of thresholds on the sufficient statistic. For distributions

Decentralized Sequential Change Detection Using Physical Layer Fusion

EXTENDED GLRT DETECTORS OF CORRELATION AND SPHERICITY: THE UNDERSAMPLED REGIME. Xavier Mestre 1, Pascal Vallet 2

False Discovery Rate Based Distributed Detection in the Presence of Byzantines

Lecture 4: Lower Bounds (ending); Thompson Sampling

UNIFORMLY MOST POWERFUL CYCLIC PERMUTATION INVARIANT DETECTION FOR DISCRETE-TIME SIGNALS

Notes, March 4, 2013, R. Dudley Maximum likelihood estimation: actual or supposed

Evolution Strategies Based Particle Filters for Fault Detection

Optimal Power Control in Decentralized Gaussian Multiple Access Channels

The properties of L p -GMM estimators

IN this paper, we consider the capacity of sticky channels, a

Research Article Degenerate-Generalized Likelihood Ratio Test for One-Sided Composite Hypotheses

Fusion of Decisions Transmitted Over Fading Channels in Wireless Sensor Networks

UWB Geolocation Techniques for IEEE a Personal Area Networks

Distributed Structures, Sequential Optimization, and Quantization for Detection

Stochastic gradient descent and robustness to ill-conditioning

CROSS-VALIDATION OF CONTROLLED DYNAMIC MODELS: BAYESIAN APPROACH

Statistical Inference

CCP Estimation. Robert A. Miller. March Dynamic Discrete Choice. Miller (Dynamic Discrete Choice) cemmap 6 March / 27

Detection Performance Limits for Distributed Sensor Networks in the Presence of Nonideal Channels

PERFORMANCE ANALYSIS OF COARRAY-BASED MUSIC AND THE CRAMÉR-RAO BOUND. Mianzhi Wang, Zhen Zhang, and Arye Nehorai

Estimation of Dynamic Regression Models

Towards control over fading channels

Riccati difference equations to non linear extended Kalman filter constraints

Transcription:

CHANGE DETECTION WITH UNKNOWN POST-CHANGE PARAMETER USING KIEFER-WOLFOWITZ METHOD Vijay Singamasetty, Navneeth Nair, Srikrishna Bhashyam and Arun Pachai Kannu Department of Electrical Engineering Indian Institute of Technology Madras, Chennai - 600036, India. ee13s075@ee.iitm.ac.in, navneethnair@gmail.com, skrishna@ee.iitm.ac.in, arunpachai@ee.iitm.ac.in ABSTRACT We consider a change detection problem with an unknown postchange parameter. The optimal algorithm in minimizing worst case detection delay subject to a constraint on average run length, referred as parallel CUSUM, is computationally expensive. We propose a low complexity algorithm based on parameter estimation using Kiefer-Wolfowitz KW method with CUSUM based change detection. We also consider a variant of KW method where the tuning sequences of KW method are reset periodically. We study the performance under the Gaussian mean change model. Our results show that reset KW-CUSUM performs close to the parallel CUSUM in terms of worst case delay versus average run length. Non-reset KW- CUSUM algorithm has smaller probability of false alarm compared to the existing algorithms, when run over a finite duration. Index Terms Kiefer-Wolfowitz algorithm, CUSUM, false alarm probability, average run length, Detection delay 1. INTRODUCTION AND SYSTEM MODEL The problem of change detection using statistical tests has been studied over several decades [1 5]. In this paper, we consider the problem of change detection when the post-change distribution has an unknown parameter. Let f θ, θ R} denote a family of probability density functions parameterized by θ. The observation at time n is denoted as Y n. Let ν N + denote the change point such that the observations before and after change follow different statistics. We assume ν to be a unknown non-random value. Specifically, the observations Y n}, n 1 are independent and follow the statistics fθ0 n < ν Y n f θ n ν. 1 Here, the value of θ 0 is known but θ is unknown apriori. However, it is known that θ Θ 1 where Θ 1 R is a known set. For ease of presentation, we first consider the case of Θ 1 being a discrete set. Later in Section 3.3, we address the case of Θ 1 being a continuous set. We are interested in developing algorithms to detect the change quickly and reliably. Specifically, we are interested in the following quantities, probability of false alarm P f, average run length τ r and average detection delay τ d and worst-case detection delay τ w. We use P ν to denote the probability measure and E ν for expectation when the change point is ν. We use ν = for the no change case. With T being the time at which the algorithm declares the change which is random, we have P f = P T < }, τ r = E T }, 3 τ d = E νt ν T ν}. 4 P f is a well studied criterion when the change point is random with a prior distribution [6]. In the non-random setting for change point, several algorithms including the conventional CUSUM algorithm will always raise false alarm P f = 1 when run over infinite time duration [3]. In such cases, τ r - the average time the algorithm takes to raise a false alarm, is a quantity of interest [1]. Alternative minimax formulations on false alarm criteria have been proposed in [7]. The detection delay τ d defined in 4 may depend on the change point. Hence, another quantity of significance is the worstcase detection delay τ w defined as τ w = sup E νt ν T ν} = sup τ d. 5 ν 1 ν 1 The above definition 5 of worst case detection delay was originally studied in [].. PRELIMINARIES AND RELATED WORK.1. Regression Function For convenience, we define Θ = θ 0 Θ 1, which is a discrete set containing all the possible parameters for the observation statistics. Also, let I R be the smallest interval such that Θ I and I may be finite or not. For θ I, we define the log-likelihood ratio LLR of the observations parameterized by θ as L θ Y n = log f θy n f θ0 Y n. 6 We define the expected value of LLR as Gθ = E νl θ Y n}, θ I, 7 where the expectation is taken over the true distribution of Y n. Since the true distribution differs before and after change, we sometimes use the explicit notation G b θ and G aθ to denote the regression function Gθ before and after change, respectively. We denote the Kullback-Leibler KL distance between pdf f 1 and f as Df 1 f = f1x log f 1x R f dx. We know that Df1 f 0 x with equality if and only if f 1 = f. The properties of the function Gθ are summarized in the following lemma and they can be easily verified. Lemma 1. Regression functions before and after change are G b θ = Df θo f θ 0. 8 G aθ = Df θ f θ0 Df θ f θ = G a θ Df θ f θ. 9 978-1-5090-4117-6/17/$31.00 017 IEEE 3919 ICASSP 017

Hence, it follows that, G b θ reaches maximum at θ 0 and G aθ reaches maximum at θ. The change detection algorithms use this property to simultaneously identify the unknown parameter and detect the change. For subsequent use, let us denote θ max = arg max θ Gθ... Existing Algorithms We discuss two existing change detection algorithms, parallel [8] and adaptive [9] CUSUM, for the case of unknown post-change parameter. CUSUM algorithm proceeds as follows: For all θ Θ 1, n 1, with the initialization W0 θ = 0, we compute Wn θ = maxwn 1 θ + L θ Y n, 0 and declare change when max θ Θ1 Wn θ > A, else we continue. Here A is a fixed threshold. In [1], asymptotic optimal properties of parallel CUSUM in minimizing the worst case detection delay with a definition slightly different from 5 subject to a constraint on the average run length has been established. With T pc being the time at which parallel CUSUM declares change, it easily follows that parallel CUSUM will always raise false alarm, if run indefinitely, i.e., P T pc < = 1. CUSUM computes the CUSUM metric for all θ Θ 1. In order to reduce this complexity, an adaptive CUSUM algorithm is proposed in [9], in which the unknown parameter is tracked adaptively. If G a is a strictly concave function, for any ɛ, we can always find p such that G ap = G ap + ɛ and θ lies in the interval [p, p + ɛ]. Choosing a small ɛ, and small step size µ, the algorithm proceeds adaptively as follows: Initialize p 0 I. Iterate as p n = p n 1 +µ[l pn 1+ɛ Y n L p n 1 Y n] and truncate/restrict p n within I. By setting ˆθ n = p n + ɛ, we estimate the unknown parameter as θ n = closeˆθ n, Θ. Here, closex, A chooses the element from the set A which has the smallest Euclidean distance to the input x. With initialization W 0 = 0, CUSUM metric at time n 1 is computed as W n = maxw n 1 + L θ n Y n, 0 10 and decision rule R with threshold A 0, is given by Declare change at time n if Wn > A R = Continue otherwise. 11 Adaptive CUSUM computes the LLR for only one parameter value θ n at each time instant. In [9], it has been argued that, if Gθ is strictly concave and symmetric about its maximum, then Eˆθ n} θ max as n, with suitably chosen step size µ. 3. KIEFER-WOLFOWITZ BASED CUSUM 3.1. Kiefer Wolfowitz Method In this method, we employ stochastic approximation [10] based Kiefer-Wolfowitz KW method [11] to estimate the unknown postchange parameter. Then we perform CUSUM using the estimated parameter, in a manner similar to adaptive CUSUM. Towards that, we define the tuning sequences a n} and c n} which satisfy the following requirements: C1: c n 0, as n C3: a n c n C: < C4: a n = 1 a nc n < 13 The polynomial-like sequences of the form a n = n a and c n = n c with suitably chosen a, c > 0 can satisfy the requirements 1-13. Standard KW CUSUM method proceeds as follows: 1. Initialize the parameter estimate as ˆθ 0 such that ˆθ 0 ± c 1 belongs to I.. Initialize the CUSUM metric as W 0 = 0. 3. Update the parameter at each time instant as ˆθ n = ˆθ n 1 + a n Lˆθ n 1 +c n Y n Lˆθ n 1 c n Y n c n 4. Truncate/Restrict ˆθ n such that ˆθ n ± c n+1 belongs to I. 5. Round off the estimate as θ n = closeˆθ n, Θ. 14 6. At each time instant n, compute the CUSUM metric as 10 and declare change with threshold A in the same manner as 11. The update equation 14 is a stochastic gradient method with a n being step-size parameter and the gradient of regression function evaluated at ˆθ n 1, which is Gˆθ n 1 +c n Gˆθ n 1 c n c n, being replaced with its instantaneous approximation. In order to proceed further, we make the following assumptions. We assume that variance of L θ Y n is bounded for all θ, specifically, sup VarL θ Y n} <. 15 θ I Further, we assume the regularity conditions: For all θ I, K 0 > 0, K 1 > 0 : K 0 θ θ max G θ K 1 θ θ max. 16 and G θθ θ max < 0, θ θ max. 17 Theorem 1 Broadie et al. [1]. If the tuning sequences satisfy the conditions 1-13 and the regularity conditions 15-17 are satisfied, for the KW estimates ˆθ n in 14, we have Eˆθ n θ max } 0 as n. We also have P ˆθ n θ max > ɛ 0 as n, for any ɛ > 0. In our model, ˆθ n converges to θ 0 in the absence of change. If the change happens at finite ν, then ˆθ n converges to θ. KW recursion 14 in the original paper [11] used two i.i.d. samples to compute Lˆθ n 1 +c n and Lˆθ n 1 c n, for instance Y n and Y n 1. However, convergence has been established in [1] even if the same sample Y n is repeatedly used, as done in 14. The mean square convergence of KW method is stronger than the mean convergence guarantee [9] for adaptive CUSUM. Also, unlike adaptive CUSUM, symmetry of regression function G is not required for convergence of KW method. 3.. Reset KW CUSUM As the step tuning sequences a n} and c n} converge to 0, when the change happens at a large ν, it takes longer time for the KW parameter estimates to converge to the post-change value and hence the detection delay of standard KW CUSUM method will be correspondingly large as illustrated in Fig. 1. To overcome this limitation, we consider resetting the tuning sequences a n and c n every P time instants as a n = a k for n > P with k = mod n 1, P +1 and a n for 1 n P can be chosen arbitrarily c n is reset in similar manner. 390

3.3. Other Models Reset and standard non-reset KW CUSUM can be applied without any modification to the following variations of our original model 1, for the case of Θ 1 = θ} which corresponds to the standard change detection problem with known pdfs and for the case of Θ 1 being a continuous set. Performance of these KW-CUSUM methods are analyzed in the following section. 4. PERFORMANCE ANALYSIS 4.1. Gaussian Mean Change Model For the performance analysis, we consider the following specific model, Y n N Yn : 0, n < ν N Y n : M, n ν 18 where N Y n : θ, denotes a Gaussian pdf with variable Y n with mean θ and variance. Here M is an unknown integer with M 1,, K}. This model appears in many applications, for instance in the physical layer fusion model in sensor networks [13] with an unknown number of affected sensors sending symbol 1 after the change [14]. For this model, Gθ will be a quadratic function and regularity conditions 15-17 will be satisfied. Hence standard KW method convergence guarantee in Theorem 1 is valid. Further, for this specific model, we present additional convergence guarantees in the following section. 4.. Additional Convergence Guarantees In the Gaussian mean change model, we have Θ = 0, 1,, K}. Also, the Kiefer-Wolfowitz KW method parameter update 14 becomes ˆθ n = ˆθ n 1 + an Yn ˆθ n 1. 19 Interestingly, the KW parameter update equation does not depend on the sequence c n. By setting ˆθ 0 = 0 and choosing a n = σ, which n meets the KW convergence requirements, the parameter estimation 19 can be re-written as n k=1 ˆθ n = Y k, 0 n which is the empirical mean of the observations. Usually, we truncate the KW parameter estimate to remain within the interval [0, K]. We ignore this truncation in the following asymptotic analysis. For subsequent use, let µ ok and µ 1k denote empirical mean of k pre-change post-change i.i.d. observations. Lemma. For the observation model 18, when there is no change, KW estimate in 0 converges to 0 almost surely. On the other hand, when change happens at a finite point ν, we have ˆθ n M almost surely, as n. Proof. Under no change, the convergence result follows from strong law of large numbers [15]. Now, consider that the change happens at finite ν. Now, the KW parameter estimate using post-change observations at time ν + N, is given by ˆθ ν+n k=1 ν+n = Y k and it ν+n can be re-written as ν 1 k=1 ˆθ ν+n = Y k ν 1 ν 1 N + ν + N+ν k=ν Y k N + ν, = µ 0ν 1 ν 1 + µ1 N + ν N + ν. 1 For fixed ν, as N, we have µ 0ν 1 ν 1 0 almost surely N+ν and µ 1 N+1 M almost surely, from strong law of large N+ν numbers. From above lemma, in the absence of change, for every realization of the observation sequence Y n}, KW parameter estimate satisfies with probability one closeˆθ n, Θ = 0, n N 0. Here N 0 may depend on the particular realization sample path. For that particular sample path, if non-reset KW-CUSUM does not cross threshold by time N 0, then it will not raise false alarm subsequently. We note in our simulation results Fig. that non-reset KW-CUSUM algorithm has smaller probability of false alarm compared to the existing algorithms, when run over finite duration. Now, we study the behavior of KW estimate 0 with respect to the number of post-change observations, when the change point ν. Lemma 3. As ν, KW estimate ˆθ ν+n 0 almost surely, if N grows sub-linearly with ν, that is N = oν. On the other hand, if N grows linearly with ν as N = βν for a constant β, we have ˆθ ν+n M β almost surely, as ν. β+1 Proof follows easily by applying limits to the expression in 1. Hence, when ν is large and N is very small compared to ν, we have closeˆθ ν+n, Θ = 0 and KW CUSUM metric does not increase at time ν + N. On the other hand, if N = βν for large enough β, we have closeˆθ ν+n, Θ = M, expected value of L θ ν+n Y ν+n is positive and KW CUSUM metric tends to increase. This suggests that the number of post change observations need to grow linearly with ν for non-reset KW CUSUM to detect the change. The linear increase in detection delay with respect to ν is observed in our simulations Fig. 1. 4.3. Reset KW CUSUM Convergence For the reset KW method where we reset a n periodically every P time instants, conditions 1,13 are not met and hence the parameter estimate will not converge in the mean square sense to the true parameter. On the other hand, we have the following guarantee on mean convergence. Lemma 4. If Π P 1 a n σ < 1, for reset KW method, we have Eˆθ n} θ max as n where θ max = 0 in the absence of change and θ max = M when change point is finite. Proof. We prove for the case of finite ν. Taking the expectation on 19, for n 0, we have Eˆθ ν+n+1} = Eˆθ ν+n} + a ν+n+1 EY σ ν+n+1} Eˆθ ν+n}. Noting that EY ν+n+1} = M, we have M Eˆθ ν+n+1} = M Eˆθ ν+n} a ν+n+1 M Eˆθ ν+n}. Defining the error eν + n = M Eˆθ ν+n}, we have eν + n + 1 = 1 a ν+n+1 eν + n. It follows that eν + n + 1 = Π n+1 k=1 [1 a ν+k σ ] eν. With 391

Average Detection Delay 700 600 500 400 0 300 10 30 330 340 00 100 Adaptive KW Reset=500 0 KW Non Reset Unrounded KW Non Reset 100 95 90 85 560 580 0 0 100 00 300 400 500 600 700 800 900 1000 Change Point Probability of False Alarm 10 0 10 1 10 Adaptive KW Reset=500 0 KW Reset=5000 00 KW Non Reset Unrounded KW Non Reset 10 3 6 8 10 1 14 16 18 0 4 Average Detection Delay Fig. 1. Plot of τ d versus ν with a fixed threshold. Fig.. Plot of P f versus τ d with ν = 10. Π P 1 a n σ ξ, we have Π n+1 k=1 where C = Π 1 a ν+l mod n+1,p l=1 bounded by a finite constant, we have Π n+1 k=1 1 a ν+k = Cξ n+1 P. As ξ < 1 and C is upper 1 a ν+k 0 as n and hence eν + n + 1 0 and Eˆθ ν+n+1} M. 5. SIMULATION RESULTS We consider Gaussian mean change model with K = 100, = 4 and M =. For KW method, we use a n = 1/n and reset it every P time instants for reset KW method. Using Monte Carlo averaging over multiple trials, by varying the threshold A, we find the values of P f, τ r and τ d for various algorithms. The time duration of each trial is restricted to 10 5 samples. In Fig. 1, we show the behavior of average detection delay τ d with respect to the change point ν, for a fixed threshold. and adaptive CUSUM do not have any significant variation of τ d with ν. In accordance with our discussions after Lemma 3, τ d of non-reset KW CUSUM linearly increases with ν. Interestingly, τ d of reset KW CUSUM legend Reset specifies the value of P exhibits oscillatory behavior, with the maximum delay occurring when ν is near the middle of the resetting window. In Fig., with ν = 10, we plot τ d versus P f by varying the threshold, where false alarm probability P f is obtained with algorithms run over a finite duration. While P f for adaptive and parallel CUSUM remains close to one, we find that P f of standard KW CUSUM decreases with increase in threshold. Intuitively, for large P, resetting KW CUSUM performs close to standard KW CUSUM and for small P, it performs close to parallel and adaptive CUSUM. We also study the unrounded standard KW CUSUM where metric 10 is computed with KW estimate ˆθ n instead of the rounded value θ n = closeˆθ n, Θ. With a large change point ν, ˆθ ν 1 is typically close to 0 and ˆθ ν+k starts drifting towards M with k but rounding θ ν+k = closeˆθ ν+k, Θ may bring it back to 0 for several values of k. Hence the unrounded standard KW CUSUM has smaller τ d compared to the rounded standard KW CUSUM in Fig. 1. We plot the worst case detection delay τ w defined in 5 versus average run length τ r in Fig. 3. It can be argued that the worst case delay occurs at ν = 1 for parallel and adaptive CUSUM similar to Lemma in [5]. For the resetting KW CUSUM, the worst-case Worst Case Detection Delay 45 40 35 30 5 0 15 10 Adaptive KW Reset=10 KW Reset=0 KW Reset=50 5 10 1 10 10 3 10 4 10 5 Average Run Length Fig. 3. Worst case delay τ w versus average run length τ r. delay can be computed as max 1 ν P τ d. As the resetting window size gets smaller, resetting KW CUSUM performs comparably to the parallel and adaptive CUSUM. 6. CONCLUSIONS We considered the change detection problem with unknown post change parameter. We proposed low complexity reset/non-reset KW CUSUM algorithms and studied their performance for the Gaussian mean change model. Reset KW CUSUM performs close to the optimal parallel CUSUM in terms of worst-case detection delay versus average run length. Our results also show that nonreset KW-CUSUM algorithm has smaller probability of false alarm compared to the existing algorithms, when run over a finite duration. 7. REFERENCES [1] G. Lorden, Procedures for reacting to a change in distribution, The Annals of Mathematical Statistics, vol. 4, 39

no. 6, pp. 1897 1908, Dec. 1971. [Online]. Available: http://www.jstor.org/stable/40115 [] M. Pollak, Optimal detection of a change in distribution, The Annals of Statistics, vol. 13, no. 1, pp. 06 7, Mar. 1985. [Online]. Available: http://www.jstor.org/stable/41154 [3] M. Basseville and I. V. Nikiforov, Detection of Abrupt Changes: Theory and Application. Upper Saddle River, NJ, USA: Prentice-Hall, Inc., 1993. [4] Y. Mei, Quickest detection in censoring sensor networks, in Intl. Symp. on Info. theory ISIT, Aug. 011, pp. 148 15. [5] V. V. Veeravalli and T. Banerjee, Chapter 6 - quickest change detection, in Academic Press Library in Signal Processing: Volume 3 Array and Statistical Signal Processing. Elsevier, 014, pp. 09 55. [6] A. N. Shiryaev, On optimum methods in quickest detection problems, Theory of Probability & Its Applications, vol. 8, no. 1, pp. 46, 1963. [7] Y. Mei, Is average run length to false alarm always an informative criterion? Sequential Analysis, vol. 7, no. 4, pp. 354 376, 008. [8] I. V. Nikiforov, A suboptimal quadratic change detection scheme, IEEE Transactions on Information Theory, vol. 46, no. 6, pp. 095 107, Sep. 000. [9] C. Li, H. Dai, and H. Li, Adaptive quickest change detection with unknown parameter, in Intl. Conf. Acoustics, Speech and Signal Proc. ICASSP, Apr. 009, pp. 341 344. [10] H. Robbins and S. Monro, A stochastic approximation method, The Annals of Mathematical Statistics, vol., no. 3, pp. 400 407, Sep. 1951. [Online]. Available: http://www.jstor.org/stable/3666 [11] J. Kiefer and J. Wolfowitz, Stochastic estimation of the maximum of a regression function, The Annals of Mathematical Statistics, vol. 3, no. 3, pp. 46 466, Sep. 195. [Online]. Available: http://www.jstor.org/stable/36690 [1] M. Broadie, D. Cicek, and A. Zeevi, General bounds and finite-time improvement for the Kiefer-Wolfowitz stochastic approximation algorithm, Operations Research, vol. 59, no. 5, pp. 111 14, Oct. 011. [13] T. Banerjee, V. Sharma, V. Kavitha, and A. K. JayaPrakasam, Generalized analysis of a distributed energy efficient algorithm for change detection, IEEE Transactions on Wireless Communications, vol. 10, no. 1, pp. 91 101, Jan. 011. [14] P. S. Kumar, B. S. Kiran, A. P. Kannu, and S. Bhashyam, Algorithms for change detection with unknown number of affected sensors, in Natl. Conf. on Communications NCC, Feb 013, pp. 1 5. [15] P. Billingsley, Probability and measure. John Wiley & Sons, 008. 393