Performance of Certain Decentralized Distributed Change Detection Procedures

Similar documents
Asymptotically Optimal Quickest Change Detection in Distributed Sensor Systems

Space-Time CUSUM for Distributed Quickest Detection Using Randomly Spaced Sensors Along a Path

Least Favorable Distributions for Robust Quickest Change Detection

Uncertainty. Jayakrishnan Unnikrishnan. CSL June PhD Defense ECE Department

arxiv: v2 [eess.sp] 20 Nov 2017

Sequential Detection. Changes: an overview. George V. Moustakides

Statistical Models and Algorithms for Real-Time Anomaly Detection Using Multi-Modal Data

Large-Scale Multi-Stream Quickest Change Detection via Shrinkage Post-Change Estimation

Quantization Effect on the Log-Likelihood Ratio and Its Application to Decentralized Sequential Detection

CHANGE DETECTION WITH UNKNOWN POST-CHANGE PARAMETER USING KIEFER-WOLFOWITZ METHOD

Quickest Changepoint Detection: Optimality Properties of the Shiryaev Roberts-Type Procedures

Decentralized Sequential Hypothesis Testing. Change Detection

X 1,n. X L, n S L S 1. Fusion Center. Final Decision. Information Bounds and Quickest Change Detection in Decentralized Decision Systems

The Shiryaev-Roberts Changepoint Detection Procedure in Retrospect - Theory and Practice

Data-Efficient Quickest Change Detection

Early Detection of a Change in Poisson Rate After Accounting For Population Size Effects

7068 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 57, NO. 10, OCTOBER 2011

Efficient scalable schemes for monitoring a large number of data streams

SEQUENTIAL CHANGE-POINT DETECTION WHEN THE PRE- AND POST-CHANGE PARAMETERS ARE UNKNOWN. Tze Leung Lai Haipeng Xing

An Effective Approach to Nonparametric Quickest Detection and Its Decentralized Realization

Accuracy and Decision Time for Decentralized Implementations of the Sequential Probability Ratio Test

Bayesian Quickest Change Detection Under Energy Constraints

Lecture 7 Introduction to Statistical Decision Theory

Quickest Detection With Post-Change Distribution Uncertainty

A CUSUM approach for online change-point detection on curve sequences

EARLY DETECTION OF A CHANGE IN POISSON RATE AFTER ACCOUNTING FOR POPULATION SIZE EFFECTS

Quickest Anomaly Detection: A Case of Active Hypothesis Testing

COMPARISON OF STATISTICAL ALGORITHMS FOR POWER SYSTEM LINE OUTAGE DETECTION

Decentralized Detection in Sensor Networks

Sequential Procedure for Testing Hypothesis about Mean of Latent Gaussian Process

Asynchronous Multi-Sensor Change-Point Detection for Seismic Tremors

Decentralized Detection In Wireless Sensor Networks

Asymptotically Optimal and Bandwith-efficient Decentralized Detection

A New Algorithm for Nonparametric Sequential Detection

Change-point models and performance measures for sequential change detection

arxiv:math/ v2 [math.st] 15 May 2006

SCALABLE ROBUST MONITORING OF LARGE-SCALE DATA STREAMS. By Ruizhi Zhang and Yajun Mei Georgia Institute of Technology

REPORT DOCUMENTATION PAGE

Optimal Design and Analysis of the Exponentially Weighted Moving Average Chart for Exponential Data

Optimum CUSUM Tests for Detecting Changes in Continuous Time Processes

Cooperative Communication with Feedback via Stochastic Approximation

Lecture 8: Information Theory and Statistics

arxiv: v2 [math.st] 20 Jul 2016

QUANTIZATION FOR DISTRIBUTED ESTIMATION IN LARGE SCALE SENSOR NETWORKS

Analysis of DualCUSUM: a Distributed Energy Efficient Algorithm for Change Detection

Bayesian Social Learning with Random Decision Making in Sequential Systems

Solving the Poisson Disorder Problem

Discussion on Change-Points: From Sequential Detection to Biology and Back by David Siegmund

A novel changepoint detection algorithm

Sequential Change-Point Approach for Online Community Detection

Spectrum Sensing via Event-triggered Sampling

IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 60, NO. 9, SEPTEMBER

Cooperative Spectrum Sensing for Cognitive Radios under Bandwidth Constraints

Real-Time Detection of Hybrid and Stealthy Cyber-Attacks in Smart Grid

A Modified Poisson Exponentially Weighted Moving Average Chart Based on Improved Square Root Transformation

Simultaneous and sequential detection of multiple interacting change points

Change Detection Algorithms

A Novel Asynchronous Communication Paradigm: Detection, Isolation, and Coding

Fusion of Decisions Transmitted Over Fading Channels in Wireless Sensor Networks

Optimal and asymptotically optimal CUSUM rules for change point detection in the Brownian Motion model with multiple alternatives

NONUNIFORM SAMPLING FOR DETECTION OF ABRUPT CHANGES*

Detection and Diagnosis of Unknown Abrupt Changes Using CUSUM Multi-Chart Schemes

Gaussian Estimation under Attack Uncertainty

WE investigate distributed detection of information flows

SEQUENTIAL CHANGE DETECTION REVISITED. BY GEORGE V. MOUSTAKIDES University of Patras

Surveillance of BiometricsAssumptions

Parametric Techniques Lecture 3

Prompt Network Anomaly Detection using SSA-Based Change-Point Detection. Hao Chen 3/7/2014

Introduction to Bayesian Statistics

Introduction p. 1 Fundamental Problems p. 2 Core of Fundamental Theory and General Mathematical Ideas p. 3 Classical Statistical Decision p.

False Discovery Rate Based Distributed Detection in the Presence of Byzantines

Parametric Techniques

certain class of distributions, any SFQ can be expressed as a set of thresholds on the sufficient statistic. For distributions

THRESHOLD LEARNING FROM SAMPLES DRAWN FROM THE NULL HYPOTHESIS FOR THE GENERALIZED LIKELIHOOD RATIO CUSUM TEST

Data-Efficient Quickest Change Detection in Minimax Settings

Anomaly detection and. in time series

EXTENDED GLRT DETECTORS OF CORRELATION AND SPHERICITY: THE UNDERSAMPLED REGIME. Xavier Mestre 1, Pascal Vallet 2

Optimal Distributed Detection Strategies for Wireless Sensor Networks

Case study: stochastic simulation via Rademacher bootstrap

Sensing for Cognitive Radio Networks

Optimum Joint Detection and Estimation

Unsupervised Anomaly Detection for High Dimensional Data

LIKELIHOOD RECEIVER FOR FH-MFSK MOBILE RADIO*

Distributed Binary Quantizers for Communication Constrained Large-scale Sensor Networks

Two results in statistical decision theory for detecting signals with unknown distributions and priors in white Gaussian noise.

IN HYPOTHESIS testing problems, a decision-maker aims

Research Article A Nonparametric Two-Sample Wald Test of Equality of Variances

Lecture 5: Likelihood ratio tests, Neyman-Pearson detectors, ROC curves, and sufficient statistics. 1 Executive summary

Lecture 8: Information Theory and Statistics

Robust Backtesting Tests for Value-at-Risk Models

Detection Performance and Energy Efficiency of Sequential Detection in a Sensor Network

Analysis of the AIC Statistic for Optimal Detection of Small Changes in Dynamic Systems

Censoring for Type-Based Multiple Access Scheme in Wireless Sensor Networks

ROBUST MINIMUM DISTANCE NEYMAN-PEARSON DETECTION OF A WEAK SIGNAL IN NON-GAUSSIAN NOISE

Asymptotic Delay Distribution and Burst Size Impact on a Network Node Driven by Self-similar Traffic

Approximation of Average Run Length of Moving Sum Algorithms Using Multivariate Probabilities

Hypothesis Testing. 1 Definitions of test statistics. CB: chapter 8; section 10.3

Chapter 2. Binary and M-ary Hypothesis Testing 2.1 Introduction (Levy 2.1)

Decentralized decision making with spatially distributed data

5682 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 55, NO. 12, DECEMBER /$ IEEE

Transcription:

Performance of Certain Decentralized Distributed Change Detection Procedures Alexander G. Tartakovsky Center for Applied Mathematical Sciences and Department of Mathematics University of Southern California Los Angeles, CA 90089-2532, USA tartakov@usc.edu Hongjoong Kim Department of Mathematics Korea University Seoul, Korea hongjoong@korea.ac.kr Abstract We compare several decentralized change-point detection procedures for multisensor distributed systems when the information available for decision-making is distributed across a set of sensors. Asymptotically optimal procedures for two scenarios are presented. In the first scenario, the sensors send quantized versions of their observations to a fusion center where change detection is performed based on all the sensor messages. If, in particular, the quantizers are binary, then the proposed binary CUSUM detection test is optimal in the class of tests with binary quantized data. In the second scenario, the sensors perform local change detection using the CUSUM procedures and send their final decisions to the fusion center for combining. The decision in favor of the change occurrence is made whenever CUSUM statistics at all sensors exceed thresholds. The latter decentralized procedure has the same first order asymptotic (as the false alarm rate is low) minimax operating characteristics as the globally optimal centralized detection procedure that has access to all the sensor observations. However, the presented Monte Carlo experiments for the Poisson example show that despite the fact that the procedure with local decisions is globally asymptotically optimal for a low false alarm rate, it performs worse than the procedure with binary quantization unless the false alarm rate is extremely low. In addition, two voting-type local decision based detection procedures are proposed and evaluated. Applications to network security (rapid detection of computer intrusions) are discussed. Keywords: Change-point sequential detection, quickest detection, distributed multisensor decisions, optimal fusion, local decisions, CUSUM test, intrusion detection. 1 Introduction We will be interested in a change-point detection problem in a multisensor situation where the information available for decision-making is distributed and decentralized. The observations are taken by a set of N distributed sensors, as shown in Figure 1. No communication between sensors and no feedback between the fusion center and sensors are allowed. The statistical properties of the sensors observations change at the same unknown point in time. The goal is to detect this change as soon as possible, subject to false alarm constraints. The sensors may send either quantized versions of their observations or local decisions to a fusion center where a final decision is made based on all the sensor messages. {X 1 (n)} {X N (n)} S 1 U 1 (n) 7 fusion center Change point decision U N (n) S N Figure 1: Change detection with distributed sensors. This problem is of considerable practical importance. It has been previously considered in [10],[22]-[25]. Specifically, Tartakovsky and Veeravalli [23] have proposed the Shiryaev-Roberts detection test based on the binary quantized data (BQ-test) as well as the test based on local decisions that are made by the Shiryaev-Roberts local detectors (LD-test). The results of that paper show that the corresponding LD-test is asymptotically globally optimal in the Bayesian setting. However, the convergence to the optimum is very slow and as a result the BQ-test, which is not globally optimal, performs better unless the false alarm rate is not extremely low. For the minimax problem formulation and CUSUM detection procedures, similar results have been recently reported by Mei [10]. In both above papers the Gaussian model has been used in MC simulations. In the present paper, we consider the minimax problem and show that the same conclusion holds for the Poisson model, which is motivated by certain network security applications. Moreover, motivated by latter applications in addition to the likelihood ratio-based detection tests we introduce more robust nonparametric tests that do not require model knowledge. In a single-sensor scenario, similar detection tests have been considered by Tartakovsky et al [19, 20, 21].

2 Decentralized Detection Tests and Their Asymptotic Operating Characteristics 2.1 The problem and potential performance Suppose there is a distributed N-sensor system in which at time n one observes an N-component vector stochastic process (X 1 (n),..., X N (n)). The i-th component X i (n), n = 1, 2,... corresponds to observations obtained from sensor S i, as shown in Figure 1. We will consider two approaches to the decentralized fusion problem. In the first case, the sensors quantize their observations and these quantized observations are sent to the fusion center; in the second scenario they make local decisions that are sent to the fusion center. At an unknown point in time λ (λ = 1, 2... ) something happens and all of the components change their distribution. Conditioned on the change point, the observation sequences {X 1 (n)}, {X 2 (n)},..., {X N (n)} are assumed to be mutually independent. Moreover, we assume that, in a particular sensor, the observations are independent and identically distributed (iid) before and after the change (with different distributions). If the change occurs at λ = k, then in sensor S i the data X i (1),..., X i (k 1) follow the distribution F0 i with a density 0 (x), while the data X i(k), X i (k + 1),... have the (x) (both with respect to a sigma-finite measure µ(x)). To be more specific, let P k (correspondingly E k ) be the probability measure (correspondingly expectation) when common distribution F i 1 with a density 1 the change occurs at time λ = k. Then, P and E stand for the probability measure and expectation when λ =, i.e., the change does not occur. Write X n i = (X i (1),..., X i (n)) and X n = (X n 1,..., X n N ). Under P, the density of X n is p 0 (X n ) = N n j=1 0 (X i(j)) for all n 1 and, under P k, the density of X n is N k 1 n p k (X n ) = 0 (X i(j)) j=1 j=k 1 (X i(j)) for k n and p k (X n ) = p 0 (X n ) for k > n. The false alarm rate can be measured by the average run length (ARL) to false alarm ARL(τ) = E τ. As a measure of the speed of detection (i.e., detection lag), we will use the supremum average detection delay (SADD) proposed by Pollak in mid 1970s SADD(τ) = sup E k (τ k τ k). 1 k< An optimal minimax detection procedure is a procedure for which SADD(τ) is minimized while ARL(τ) is set at a given level γ, γ > 0. Specifically, define the class of change-point detection procedures (γ) = {τ : ARL(τ) γ} for which the ARL exceeds the predefined positive number γ. The optimal change-point detection procedure is described by the stopping time Let ν = arg inf SADD(τ). τ (γ) Z i (n) = log 1 (X i(n)) 0 (X i(n)) be the log-likelihood ratio (LLR) between the change and no-change hypotheses for the n-th observation from the i-th sensor and let I i = E 1 Z i (1) = log ( 1 (x) 0 (x) ) 1 (x)µ(dx) be the Kullback-Leibler (K-L) information number between the densities 1 (x) and f (i) 0 (x). The asymptotic performance of an optimal centralized detection procedure that has access to all data X n is given by (1) log γ inf SADD(τ) = (1 + o(1)), γ, (2) τ (γ) I tot where I tot = I i. See, e.g., [1, 9, 16, 22]. This performance is attained for the centralized CUSUM test that uses all available data. 2.2 Centralized CUSUM detection test The centralized CUSUM test is defined as τ c = min {n 1 : W c (n) h}, where the (centralized) CUSUM statistic W c (n) is given by the recursion { } N W c (n) = max 0, W c (n 1) + Z i (n) (3) (W c (0) = 0) and the threshold h is chosen so that ARL(τ c (h)) = γ. It is known [1, 9, 15, 16] that ARL(τ c (h)) e h and, hence, h = log γ guarantees ARL(τ c (h)) γ. The latter choice is usually conservative but useful for preliminary estimates and first-order asymptotic analysis. Substantial improvements can be obtained using corrected Brownian motion approximations [15] and the renewal argument [18]. In the following two subsections, we consider two types of decentralized detection procedures that use compressed data U 1 (n),..., U N (n). These compressed data are transmitted to the fusion center for making the final decision. The compression level for both types of procedures is maximal the data U i (n) = 0 or 1, i.e., binary. Thus, for both proposed decentralized detection procedures the required bandwidth for communication with the fusion center is minimal. The advantage of the first detection test with binary quantized data is that it does not require any processing power at the sensors. In Section 2.4.2, even simpler voting local decision-based detection tests are introduced.

2.3 Decentralized CUSUM test with binary quantization at the sensors Consider the scenario where based on observation X i (n) available at sensor S i at time n a message U i (n) belonging to a finite alphabet (e.g., binary) is formed and sent to the fusion center (see Figure 1). Write U n = (U 1 (n),..., U N (n)) for the vector of N messages at time n. Based on the sequence of sensor messages, a decision about the change is made at the fusion center. The goal is to find a detection test at the fusion center that has certain optimality properties. This test is identified with a stopping time on {U n } n 1 at which it is declared that a change has occurred. The corresponding problem has been considered by Tartakovsky and Veeravalli [22] in detail. In the following we consider the simplest case where U i (n) = ψ i (X i (n)) are the outputs of binary quantizers. It follows from [10, 22] that the asymptotically optimal policy for the decentralized change detection problem with binary quantization that minimizes SADD(τ) = sup k E k {τ k τ k), while maintaining the ARL(τ) at a level greater than γ, consists of a set of stationary monotone likelihood ratio quantizers (MLRQ) at the sensors followed by the CUSUM procedure based on {U n } n 1 at the fusion center. More specifically, the optimal binary quantizer is the MLRQ which is given by U i = ψ i (X) = 1 if 1 (X) t i, 0 (X) 0 otherwise, where t i is a positive finite threshold that maximizes the K-L information in the resulting Bernoulli sequence for the post-change and pre-change hypotheses. To be precise, for l = 0, 1, let g (i) denote the probability induced on U i (n) when the observation X i (n) is distributed as. Let β 0,i = g (i) 0 (U i(j) = 1) and β i = g (i) 1 (U i(j) = l 1) denote the corresponding probabilities under the normal and the anomalous conditions, respectively. The resulting binary (Bernoulli) sequences {U i (j), i = 1,..., N}, j 1 are then used to form the binary CUSUM statistic similar to (3) as W b (n) = max{0, W b (n 1) + where W b (0) = 0 and l N Zi b (n))}, (4) Z b i (n) = log g(i) 1 (U i(n))) g (i) 0 (U i(n)) is the partial LLR between the change and no-change hypotheses for the binary sequence, which is given by Here Z b i (n) = a i U i (n) + a 0,i. a i = log β i(1 β 0,i ) β 0,i (1 β i ), a 0,i = log 1 β i 1 β 0,i. Then the CUSUM detection procedure at the fusion center is given by the stopping time τ b (h) = min { n 1 : W b (n) h }, (5) where h is a positive threshold which is selected so that ARL(τ b (h)) γ. In what follows this detection procedure will be referred to as the binary quantized CUSUM test and the abbreviation BQ-CUSUM will be used throughout the paper. It follows from [22] that the BQ-CUSUM procedure with h = log γ is asymptotically optimal as γ in the class of tests with binary quantization in the sense of minimizing the SADD in the class (γ). More specifically, SADD(τ b ) = E 1 (τ b 1) and the tradeoff curve that relates SADD and ARL for the large ARL is SADD(τ b ) log(arl) (β ia i + a 0,i ). (6) Note that probabilities β i = β i (t i ) and β 0,i = β 0,i (t i ) depend on the value of threshold t i. To optimize the performance, one should choose thresholds t 1,..., t N so that the denominator in (6) is maximized, i.e., t 0 i = arg max t i>0 Ib i(t i ), i = 1,..., N, (7) where I b i (t i) = β i (t i )a i (t i ) + a 0,i (t i ) is the K-L distance for the binary sequence in the i-th sensor. It follows from (6) and (7) that the tradeoff curve for the optimal binary test is SADD(τ b ) log γ I b, γ, (8) tot where I b tot = max t i [β i (t i )a i (t i ) + a 0,i (t i )]. The asymptotic relative efficiency (ARE) of a detection procedure τ γ with respect to a detection procedure η γ, both of which meet the same lower bound γ for the ARL, will be defined as SADD(τ γ ) ARE(τ γ ; η γ ) = lim γ SADD(η γ ). Using (2) and (8), we obtain that the ARE of the globally asymptotically optimal test ν with respect to the BQ- CUSUM test τ b is inf τ (γ) SADD(τ) ARE(ν; τ b ) = lim = Ib tot. (9) γ SADD(τ b (h γ )) I tot Since I tot is always larger than I b tot, the value of ARE < 1. However, our study presented below shows that certain decentralized asymptotically globally optimal tests may perform worse in practically interesting prelimit situations when the false alarm rate is moderately low but not very low. 2.4 Decentralized detection tests based on local decisions We now consider three detection schemes that perform local detection in the sensors and then transmit these local binary decisions to the fusion center for optimal combining and final decision-making. The abbreviation LD-CUSUM will be used for procedures that perform CUSUM tests in sensors and use local decisions.

2.4.1 Asymptotically optimal decentralized LD-CUSUM test Let W i (n) = max {0, W i (n 1) + Z i (n)}, W i (0) = 0 be the CUSUM statistic in the i-th sensor, where, as before, Z i (n) = log[ 1 (X i(n))/ 0 (X i(n))] is the LLR for the original sequence. Let U i (n) = { 1 if W n (i) π i h 0 otherwise, where π i = I i /I tot and h is a positive threshold. The stopping time is defined as { } T ld (h) = min n : min [W i(n)/π i ] h. (10) 1 i N In other words, binary local decisions (1 or 0) are transmitted to the fusion center, and the change is declared at the first time when U i (n) = 1 for all sensors i = 1,..., N. It follows from Mei [10] that if E 1 Z 1 (i) 3 <, then E T ld (h) e h. Under an additional Cramér-type condition, it follows from Dragalin et al [2] that SADD(T ld (h)) = h h + C N 1 + o(1), (11) I tot I tot where { } σi C N = E max Y i, (12) 1 i N I i Y 1,..., Y N are independent standard Gaussian random variables; σ i = Var i (Z 1 (i)); Var i is the operator of variance under 1. Therefore, if h = log γ, then inf SADD(τ) SADD(T ld(h)) log γ, τ (γ) I tot γ and the detection test T ld (h) is globally asymptotically optimal (AO), i.e., ARE(T ld ; τ c ) = 1. Correspondingly, we will use the abbreviation AO-LD-CUSUM for this test in the rest of the paper. However, since the second term in the asymptotic approximation (11) is on the order of the square root of the threshold, it is expected that the convergence to the optimum is slow. Note that for the optimal centralized CUSUM test and for the decentralized CUSUM test with binary quantization residual terms are constants. We therefore expect that for moderate false alarm rates typical for practical applications the procedure with quantization may perform better. This fact is confirmed by MC simulations for Gaussian models [10, 23]. In Section 4, this conjecture is verified for the Poisson model. 2.4.2 Decentralized minimal and maximal LD-CUSUM tests Let τ i (h) = min{n : W i (n) h} denote the stopping time of the CUSUM test in the i-th sensor. Introduce the stopping times T min (h) = min(τ 1,..., τ N ), T max (h) = max(τ 1,..., τ N ) that will be referred to as minimal LD-CUSUM (Min-LD- CUSUM) and maximal LD-CUSUM (Max-LD-CUSUM) tests, respectively. Consider first the false alarm rate for these two detection tests. Clearly, E T max E τ i for every i = 1,..., N. Since E τ i e h, it follows that, for every h > 0, ARL(T max ) e h. It can be also shown that, for every h > 0, ARL(T min ) N 1 e h (cf. Tartakovsky [17]). These inequalities are usually very conservative. For large threshold values asymptotically sharp approximations can be derived as follows. It follows from [18] that, as h, under the no-change hypothesis the stopping times τ i, i = 1,..., N are exponentially distributed with mean values c i e h, where c i 1 are constants that can be computed numerically for any particular model using renewal argument. Therefore, for large threshold, T min (h) is approximately exponentially distributed with mean )] 1, while the mean of the stop- where α N = [ ping time T max is ARL(T min ) α N e h, (c 1 i ARL(T max ) α Ne h as h, where α N > α N can be easily computed for any N. In particular, for N = 5 and in the symmetric case, α 5 = 137c/60 2.28c and α 5 = c/5. Also, it may be shown that, as h, SADD(T min ) h, SADD(T max ) h. max i I i min i I i Therefore, taking the thresholds h = log(γ/α N ) in the Min-LD-CUSUM and h = log(γ/α N ) in the Max-LD- CUSUM, we obtain the tradeoff curves that relate the SADD and the ARL, as γ : SADD(T min ) log γ, SADD(T max ) log γ. max i I i min i I i It follows that in the symmetric case where I i = I the asymptotic relative efficiency of these detection tests compared to the optimal centralized test is ARE(T min ; τ c ) = ARE(T min ; τ c ) = N. Note that while based on the first-order asymptotics it may be expected that in the symmetric case the Max-LD- CUSUM test performs better for moderate FAR due to the fact that α N > α N, in reality it is difficult to make certain conclusions, since the second terms in the asymptotic decomposition may reverse this conclusion. Monte Carlo simulations in Section 4 show that the Min-LD-CUSUM test performs better even in the symmetric case.

3 Applications to Intrusion Detection in Distributed Computer Networks One of the important applications that stimulated the research in this paper is intrusion detection in distributed high-speed computer networks. A significant number of serious cyberattacks on a variety of governmental agencies, universities, and corporations have recently been identified [3, 4, 5, 6, 12, 14]. These attacks, including a variety of buffer overflows, worm-based, denial-of-service (DoS) and man-in-the-middle (MiM) attacks, are designed to gain access to additional hosts, steal sensitive data, and disrupt network services. As a result, rapid detection of a wide spectrum of network intrusions and robust separation of legitimate and malicious traffic are vital for the continuation of normal operation of networks. See Kent [6] and Tartakovsky et all [19]-[21] for a more detailed discussion. Typically network intrusions occur at unknown points in time and lead to changes in the statistical properties of certain observables. For example, distributed DoS (DDoS) attacks lead to changes in the mean value of the number of packets of a particular type (TCP, ICMP, or UDP) and size, while address resolution protocol (ARP) MiM attacks lead to changes in the average number of ARP requests [7],[19]- [21]. It is therefore intuitively appealing to formulate the problem of detecting attacks as a quickest change-point detection problem: to detect changes in statistical models as rapidly as possible (i.e., with minimal average delays) while maintaining the false alarm rate at a given low level. It follows from the results of the previous section that in the case of complete information about the pre-change and the post-change models, (asymptotically) optimal detection procedures in multisensor detection systems can be constructed based on the LLR-based CUSUM tests. However, in intrusion detection applications, these models are unknown. For this reason, in [7],[19]-[21], a nonparametric approach was proposed and thoroughly tested for a singlesensor scenario. This approach can be easily extended to the multisensor centralized and decentralized scenarios. More specifically, when the pre-change and post-change densities are unknown, the LLRs Z i (n) defined in (1) are also unknown and should be replaced by appropriate score functions s i (n) that have negative mean values E s i (n) < 0 before the change occurs and positive mean values E k s i (n) > 0 after the change occurs. While we do not specify any particular model in terms of probability distributions, some assumptions on the change should be made. Indeed, score functions can be chosen in many ways, and their selection depends crucially on the type of change that we intend to detect. For example, different score functions are used to detect changes in the mean and changes in the variance. In applications of interest, the detection problem can be usually reduced to detecting changes in mean values. Let µ i = E X i (j) and θ i = E 1 X i (j) denote the pre-change and post-change mean values in the i-th sensor. Typically, the baseline mean values µ i can be estimated quite accurately in advance while the values of θ i are usually unknown and either should be estimated on-line or replaced by reasonable numbers, e.g., by the expected minimal values. In the rest of this subsection we suppose for concreteness that θ i > µ i. For i = 1,..., N, introduce the following score functions s i (n) = X i (n) µ i c i, where in the general case c i = c i (n) may depend on past observations, which is desirable to guarantee an adaptive structure of the detection procedure. For example, one may take c i (n) = εˆθ i,n, where ε is a tuning parameter belonging to the interval (0, 1) and ˆθ i,n = ˆθ i,n (X n i ) is an estimate of the unknown mean θ i. Choosing the latter estimators as well as optimizing the parameter ε based on the training data are not straightforward tasks, as discussed in detail in Tartakovsky et al [19]. For this reason, it is convenient to set c i (n) = c i, where c i are positive constants that do not depend on n. Positiveness of c i is essential to guarantee the negative value of E s i (n) = c i under the no-change hypothesis. On the other hand, c i does not have to be too large in order to guarantee the positive value of E 1 s i (n) = θ i µ i c i under the alternative hypothesis. A particular choice of c i is discussed in [19]. If the above conditions hold, the score-based CUSUM statistic in the i-th sensor W s i (n) = max {0, W s i (n 1) + s i (n)} remains close to zero in normal conditions while when the change occurs it starts rapidly drifting upward (see Figure 2). The combined from all the sensors, centralized CUSUM statistic { } N W s (n) = max 0, W s (n 1) + s i (n) has a similar behavior. The time of alarm in the centralized detection scheme is defined as the first time n when the statistic W s (n) crosses a positive threshold. A binary quantized version of the CUSUM test can be designed analogously to Section 2.3. See [19] for further details. Finally, a nonparametric LD-CUSUM test has the form (10) where the LLR-based CUSUM statistic W i (n) is replaced with the score-based CUSUM statistic Wi s (n) and where π i = θ i µ i c i (θ i µ i c i ). For the sake of simplicity, we assume here that the postchange mean values θ i are known. Note that the above nonparametric detection algorithms are no longer guaranteed to be optimal. Certain optimization is possible based on the training data [19]. The behavior of the nonparametric local CUSUM statistics Wi s (n) and the corresponding binary counterparts is shown in Figure 2 for the ARP MiM attack. These plots have been obtained by simulating corresponding ARP MiM attacks and legitimate traffic in a network tesbed based on the University of Utah NetBed/Emulab. The network topology consisted of two subnets that contain a local detector whose output is utilized by the fusion center at the top level. During this attack, the attacker sends unrequested forged ARP replies to victim hosts, informing them falsely that the attacker is the destination of their connection with the

Figure 2: Nonparametric local and binary CUSUM detection statistics for a simulated ARP MiM attack. other host. The attacker can then filter, record, or arbitrarily modify the data before sending it to the true destination. The ARP MiM attack can be used for password and user name capture as well as for connection hijacking and realtime decryption if authentication certificates are not used for secure communications. Due to space limitations, a detailed study of the above centralized and decentralized nonparametric algorithms will be performed elsewhere. In the next section, we present the results of MC simulations of the LLR-based detection tests for a Poisson example. 4 Monte Carlo Experiments In this section, we present the results of MC experiments for the Poisson example where observations in the i-th sensor X i (n), n 1 follow the common Poisson distribution P(µ i ) in the pre-change mode and the common Poisson distribution P(θ i ) after the change occurs, i.e., for m = 0, 1, 2,... and λ = k, P k (X i (n) = m) = { (µi) m m! e µi for k > n, (θ i) m m! e θi for k n, where without loss of generality we assume that θ i > µ i. Write Q i = θ i /µ i. It is easily seen that the LLR statistic in the i-th senor has the form Z n (i) = X i (n) log(q i ) µ i (Q i 1), (13) and the K-L information numbers I i = θ i log Q i µ i (Q i 1), i = 1,..., N. (14) It follows from (2), (14) and the above discussion that the centralized CUSUM and AO-LD-CUSUM tests with the thresholds h = log γ are first-order globally asymptotically optimal and inf SADD(τ) SADD(τ c) SADD(τ ld ) τ (γ) log γ [θ i log Q i µ i (Q i 1)]. (15) This means that the ARE of these detection tests with respect to the globally optimal test is equal to 1. In order to evaluate the ARE of an optimal test ν (e.g., the centralized CUSUM test τ c ) with respect to the BQ- CUSUM test (5) we use (9), which yields ARE(ν; τ b ) = max t i [β i (t i )a i (t) + a 0,i (t i )] [θ, (16) i log Q i µ i (Q i 1)] where the probabilities β 0,i (t) and β i (t) are given by: β 0,i (t i ) = k= t i µ k i e µi, β i (t i ) = k! k= t i θi k e θi. k! Note that since likelihood ratios are monotone functions of X i (n), it is equivalent to quantize the observations. Here and in the following the thresholds t i are set in the space of observations rather than in likelihood ratio space. The optimal values of t 0 i that maximize the K-L numbers (7) are easily found based on these formulas. Consider a symmetric case where µ i = 10 and θ i = 12 for all i = 1,..., N. Then I i = I = 0.1879, the optimum threshold is t 0 i = 12, and the corresponding maximum K-L distance for the binary sequence I b i (t0 i ) = Ib = 0.119. Therefore, the loss in efficiency of the BQ-test compared to the globally asymptotically optimal detection procedure is ARE(ν; τ b ) = 0.119/0.1879 = 0.63, i.e., for the large ARL we expect about 37% increase in the average detection delay compared to the centralized CUSUM (C-CUSUM). The following MC simulations show that for the practically interesting values of the ARL (up to 13, 360) the gain of the optimal C-CUSUM test is even smaller, while the AO-LD- CUSUM test performs worse than the BQ-CUSUM test due to the reasons discussed in Section 2.4. MC simulations have been performed for the above symmetric situation (i.e., µ i = µ = 10 and θ i = θ = 12) with N = 5 sensors. We used 10 5 MC replications in the experiment. The operating characteristics of the five detection tests (SADD vs log(arl)) are shown in Figure 3 and Table 1. It is seen that the BQ-CUSUM test substantially outperforms the AO-LD-CUSUM test for all false alarm rate range used in simulations. This result confirms our conjecture. It is also seen that both Min-LD-CUSUM and Max-LD-CUSUM perform worse than both BQ-CUSUM and AO-LD-CUSUM tests. Table 2 shows the relative efficiency of the BQ-CUSUM procedure with respect to four other detection procedures,

Table 1: Operating Characteristics of Detection Procedures log(arl) 3.5 4.5 5.5 6.5 7.5 8.5 9.5 ARL 33 90 245 665 1808 4915 13360 SADD for C-CUSUM 1.82 2.79 3.81 4.85 5.90 6.94 8.00 SADD for AO-LD-CUSUM 3.87 5.79 7.72 9.68 11.52 13.28 15.06 SADD for Min-LD-CUSUM 4.47 7.28 10.46 13.75 17.50 20.84 24.17 SADD for Max-LD-CUSUM 8.30 13.91 21.39 28.95 36.38 43.65 51.37 SADD for BQ-CUSUM 2.75 4.21 5.77 7.40 9.01 10.65 12.28 Table 2: Relative Efficiency of the Decentralized BQ-CUSUM Test log(arl) 3.5 4.5 5.5 6.5 7.5 8.5 9.5 ARL 33 90 245 665 1808 4915 13360 Test Relative Efficiency of the Decentralized BQ-CUSUM Test C-CUSUM 1.51 1.51 1.51 1.53 1.53 1.53 1.54 1.59 AO-LD-CUSUM 0.71 0.73 0.75 0.76 0.78 0.80 0.82 1.59 Min-LD-CUSUM 0.62 0.58 0.55 0.54 0.51 0.51 0.51 0.316 Max-LD-CUSUM 0.33 0.30 0.27 0.26 0.25 0.24 0.24 0.316 which is defined as the ratio of average detection delays for the same ARL: SADD(τ b )/SADD(ν), where ν is a corresponding detection test, i.e., ν = τ c, T ld, etc. It follows from the table that for the BQ-CUSUM the increase in the SADD compared to the globally optimal centralized CUSUM is 34% for high false alarm rate, 35% for moderate and low false alarm rate, and 37% for very low false alarm rate. Note that the last column presents the ARE. On the other hand, the BQ-CUSUM outperforms the AO- LD-CUSUM for all range of tested ARL values, from 33 to 13,360. The gain is 30% for high false alarm rate and slowly reduces to 18% for low false alarm rate. Figure 3: Operating characteristics of detection procedures. 5 Discussion and Conclusions The presented results allow us to compare performance of four proposed decentralized change detection procedures, as well as to determine loss in efficiency compared to the globally optimal centralized scheme. The first detection test, called the BQ-CUSUM test, uses binary quantizers at the sensors followed by the CUSUM detection procedure at the fusion sensor. The second detection test, called the AO-LD-CUSUM test, performs local detection at the sensors using CUSUM tests, and at each sampling point transmits these local decisions to the fusion center for combining and making the final decision. Both decentralized detection procedures transmit only binary sequences of 1 s and 0 s to the fusion center. Therefore, both detection tests use maximal possible level of data compression and require minimum bandwidth for communication. The third and fourth decentralized detection procedures, called the minimal and maximal LD-CUSUM tests respectively, are based on independent voting of sensors. In the former one the decision is made at the first time when the first CUSUM test detects the change; while in the latter one when all the sensors detect the change (but independently, not like in the AO-LD- CUSUM). Due to losses of information, the BQ-CUSUM test is inferior to the globally optimal centralized CUSUM test. On the other hand, the AO-LD-CUSUM test is first-order asymptotically globally optimal for low false alarm rate. However, convergence to the optimum is expected to be slow, since the second term in the decomposition for the average detection delay goes to infinity as the square root of the threshold. We therefore conjectured that despite the fact that the AO-LD-CUSUM test is first-order asymptotically optimal it may perform worse than the non-optimal BQ-CUSUM test in a realistic environment. The results of MC simulations for the Poisson model confirm this latter hypothesis. For the model considered the BQ-CUSUM outperforms the LD-CUSUM for all range of tested ARLs, from 33 to 13,360. The increase in the SADD is 30% for high false alarm rate and it slowly reduces to 18% for low false alarm rate. While potentially the ARE of the AO- LD-CUSUM test compared to the BQ-CUSUM test is 37%, this performance never kicks in for realistic moderately low false alarm rate. The voting Min-LD-CUSUM and Max-LD-CUSUM tests are neither asymptotically optimal nor very efficient. Both tests are inferior to AO-LD-CUSUM and BQ-

CUSUM tests. The Min-LD-CUSUM test is superior to the Max-LD-CUSUM test in the symmetric case, and it is expected to perform even better in asymmetric scenarios. The additional advantage of the BQ-CUSUM test compared to all other decentralized LD-CUSUM tests is that it does not require any processing power at the sensors. While the considered Poisson model is motivated by network security applications such as rapid detection of computer intrusions, in reality it never holds and therefore efficient nonparametric detection procedures are needed. Suitable procedures are briefly discussed in Section 3. Their comprehensive study (theoretical, MC simulations, and implementation for real data sets) for multisensor distributed systems is important. We left this study for future work. Acknowledgement The work of Alexander Tartakovsky was supported in part by the U.S. Office of Naval Research grant N00014-06-1-0110 at the University of Southern California and by the U.S. ARMY SBIR contract W911QX-04-C-0001 at AD- SANTEC. The research of Hongjoong Kim was supported by the MIC under the ITRC support program supervised by the IITA. References [1] M. Basseville and I.V. Nikiforov, Detection of Abrupt Changes: Theory and Applications. Prentice Hall, Englewood Cliffs, 1993. [2] V. Dragalin, A. Tartakovsky, and V. Veeravalli, Multihypothesis sequential probability ratio tests, part 2: accurate asymptotic expansions for the expected sample size, IEEE Trans. Inform. Theory Vol. 46, No. 4, pp. 1366-1383, 2000. [3] L. Garber, Denial-of-service attacks rip the Internet, Computer, April 2000. [4] S. Gibson, Distributed reflection denial of service: description and analysis of a potent, increasingly prevalent, and worrisome Internet attack, Gibson Research Corporation, 2002. http://www.grc.com/dos/drdos.htm [5] A. Hussain, J. Heidemann, and C. Papadopoulos, A framework for classifying denial of service attacks, Proc. Sigcomm 2003, Karlsruhe, Germany, 2003. [6] S. Kent, On the trial of intrusions into information systems, IEEE Spectrum, Vol. 37, Issue 12, pp. 52 56, 2000. [7] H. Kim, B. Rozovskii, and A. Tartakovsky, A nonparametric multichart CUSUM test for rapid detection of DOS attacks in computer networks, Internat. J. Computing and Information Sciences, Vol. 2, No. 3, pp. 149-158, 2004. [8] T.L. Lai, Sequential changepoint detection in quality control and dynamical systems, J. R. Statist. Soc. B, Vol. 57, No. 4, pp. 613 658, 1995. [9] G. Lorden, Procedures for reacting to a change in distribution, Ann. Math. Statist., Vol. 42, pp. 1987 1908, 1971. [10] Y. Mei, Information bounds and quickest change detection in decentralized decision systems, IEEE Trans. Inform. Theory, Vol. 51, pp. 2669 2681, 2005. [11] G.V. Moustakides, Optimal stopping times for detecting changes in distributions, Ann. Statist., Vol. 14, pp. 1379 1387, 1986. [12] V. Paxson, Bro: A System for detecting network intruders in real-time, Computer Networks, Vol. 31(23-24), pp. 2435 2463, 1999. [13] M. Pollak, Optimal detection of a change in distribution, Ann. Statist., Vol. 13, pp. 206 227, 1985. [14] M. Roesch, Snort: Lightweight intrusion detection for networks, Proc. 13th Systems Administration Conference (LISA), pp. 229 238, 1999. [15] D. Siegmund, Sequential Analysis: Tests and Confidence Intervals. Springer-Verlag, New York, 1985. [16] A.G. Tartakovsky, Sequential Methods in the Theory of Information Systems. Radio i Svyaz, Moscow, 1991 (In Russian). [17] A.G. Tartakovsky, Asymptotically minimax multialternative sequential rule for disorder detection, In: Statistics and Control of Random Processes: Proc. Steklov Institute of Mathematics, vol. 202, Issue 4, pp. 229 236, 1994. AMS, Providence, RI. [18] A.G. Tartakovsky, Asymptotic performance of a multichart CUSUM test under false alarm probability constraint, Proc. 44th IEEE Conf. on Decision and Control and the European Control Conf. (CDC-ECC 05), December 12-15, 2005, pp. 320 325, Seville, Spain, Omnipress CD-ROM, ISBN 0-7803-9568-9. [19] A.G. Tartakovsky, B.L. Rozovskii, R. Blažek, and H. Kim, Detection of intrusions in information systems by sequential change-point methods, Statistical Methodology, 2006 (to appear). [20] A. Tartakovsky, B. Rozovskii, R. Blažek, and H. Kim, A novel approach to detection of intrusions in computer networks via adaptive sequential and batchsequential change-point detection methods, IEEE Trans. Signal Processing, 2006 (to appear). [21] A.G. Tartakovsky, K. Shah, and B.L. Rozovskii, A nonparametric multichart CUSUM test for rapid intrusion detection, Proc. JSM, Minneapolis, MN, 7-11 August 2005 (CD Rom). [22] A.G. Tartakovsky and V.V. Veeravalli, An efficient sequential procedure for detecting changes in multichannel and distributed systems, Proc. 5th Intern. Conf. on Information Fusion, Annapolis, MD, 8-11 July 2002, Vol. 1, pp. 41 48. [23] A.G. Tartakovsky and V.V. Veeravalli, Quickest change detection in distributed sensor systems, Proc. 6th Intern. Conf. on Information Fusion, Cairns, Australia, MD, 8-10 July 2003, pp. 756 763 [24] A.G. Tartakovsky and V. Veeravalli, Change-point detection in multichannel and distributed systems with applications, in: N. Mukhopadhyay, S. Datta and S. Chattopadhyay, eds., Applications of Sequential Methodologies, Marcel Dekker, Inc., NY, 2004, pp. 339 370. [25] A.G. Tartakovsky and V. Veeravalli, General asymptotic Bayesian theory of quickest change detection, Theory Prob. Appl. Vol. 49, No. 3, pp. 538 582, 2004.