Performance of Certain Decentralized Distributed Change Detection Procedures


 Allyson Tyler
 3 months ago
 Views:
Transcription
1 Performance of Certain Decentralized Distributed Change Detection Procedures Alexander G. Tartakovsky Center for Applied Mathematical Sciences and Department of Mathematics University of Southern California Los Angeles, CA , USA Hongjoong Kim Department of Mathematics Korea University Seoul, Korea Abstract We compare several decentralized changepoint detection procedures for multisensor distributed systems when the information available for decisionmaking is distributed across a set of sensors. Asymptotically optimal procedures for two scenarios are presented. In the first scenario, the sensors send quantized versions of their observations to a fusion center where change detection is performed based on all the sensor messages. If, in particular, the quantizers are binary, then the proposed binary CUSUM detection test is optimal in the class of tests with binary quantized data. In the second scenario, the sensors perform local change detection using the CUSUM procedures and send their final decisions to the fusion center for combining. The decision in favor of the change occurrence is made whenever CUSUM statistics at all sensors exceed thresholds. The latter decentralized procedure has the same first order asymptotic (as the false alarm rate is low) minimax operating characteristics as the globally optimal centralized detection procedure that has access to all the sensor observations. However, the presented Monte Carlo experiments for the Poisson example show that despite the fact that the procedure with local decisions is globally asymptotically optimal for a low false alarm rate, it performs worse than the procedure with binary quantization unless the false alarm rate is extremely low. In addition, two votingtype local decision based detection procedures are proposed and evaluated. Applications to network security (rapid detection of computer intrusions) are discussed. Keywords: Changepoint sequential detection, quickest detection, distributed multisensor decisions, optimal fusion, local decisions, CUSUM test, intrusion detection. 1 Introduction We will be interested in a changepoint detection problem in a multisensor situation where the information available for decisionmaking is distributed and decentralized. The observations are taken by a set of N distributed sensors, as shown in Figure 1. No communication between sensors and no feedback between the fusion center and sensors are allowed. The statistical properties of the sensors observations change at the same unknown point in time. The goal is to detect this change as soon as possible, subject to false alarm constraints. The sensors may send either quantized versions of their observations or local decisions to a fusion center where a final decision is made based on all the sensor messages. {X 1 (n)} {X N (n)} S 1 U 1 (n) 7 fusion center Change point decision U N (n) S N Figure 1: Change detection with distributed sensors. This problem is of considerable practical importance. It has been previously considered in [10],[22][25]. Specifically, Tartakovsky and Veeravalli [23] have proposed the ShiryaevRoberts detection test based on the binary quantized data (BQtest) as well as the test based on local decisions that are made by the ShiryaevRoberts local detectors (LDtest). The results of that paper show that the corresponding LDtest is asymptotically globally optimal in the Bayesian setting. However, the convergence to the optimum is very slow and as a result the BQtest, which is not globally optimal, performs better unless the false alarm rate is not extremely low. For the minimax problem formulation and CUSUM detection procedures, similar results have been recently reported by Mei [10]. In both above papers the Gaussian model has been used in MC simulations. In the present paper, we consider the minimax problem and show that the same conclusion holds for the Poisson model, which is motivated by certain network security applications. Moreover, motivated by latter applications in addition to the likelihood ratiobased detection tests we introduce more robust nonparametric tests that do not require model knowledge. In a singlesensor scenario, similar detection tests have been considered by Tartakovsky et al [19, 20, 21].
2 2 Decentralized Detection Tests and Their Asymptotic Operating Characteristics 2.1 The problem and potential performance Suppose there is a distributed Nsensor system in which at time n one observes an Ncomponent vector stochastic process (X 1 (n),..., X N (n)). The ith component X i (n), n = 1, 2,... corresponds to observations obtained from sensor S i, as shown in Figure 1. We will consider two approaches to the decentralized fusion problem. In the first case, the sensors quantize their observations and these quantized observations are sent to the fusion center; in the second scenario they make local decisions that are sent to the fusion center. At an unknown point in time λ (λ = 1, 2... ) something happens and all of the components change their distribution. Conditioned on the change point, the observation sequences {X 1 (n)}, {X 2 (n)},..., {X N (n)} are assumed to be mutually independent. Moreover, we assume that, in a particular sensor, the observations are independent and identically distributed (iid) before and after the change (with different distributions). If the change occurs at λ = k, then in sensor S i the data X i (1),..., X i (k 1) follow the distribution F0 i with a density 0 (x), while the data X i(k), X i (k + 1),... have the (x) (both with respect to a sigmafinite measure µ(x)). To be more specific, let P k (correspondingly E k ) be the probability measure (correspondingly expectation) when common distribution F i 1 with a density 1 the change occurs at time λ = k. Then, P and E stand for the probability measure and expectation when λ =, i.e., the change does not occur. Write X n i = (X i (1),..., X i (n)) and X n = (X n 1,..., X n N ). Under P, the density of X n is p 0 (X n ) = N n j=1 0 (X i(j)) for all n 1 and, under P k, the density of X n is N k 1 n p k (X n ) = 0 (X i(j)) j=1 j=k 1 (X i(j)) for k n and p k (X n ) = p 0 (X n ) for k > n. The false alarm rate can be measured by the average run length (ARL) to false alarm ARL(τ) = E τ. As a measure of the speed of detection (i.e., detection lag), we will use the supremum average detection delay (SADD) proposed by Pollak in mid 1970s SADD(τ) = sup E k (τ k τ k). 1 k< An optimal minimax detection procedure is a procedure for which SADD(τ) is minimized while ARL(τ) is set at a given level γ, γ > 0. Specifically, define the class of changepoint detection procedures (γ) = {τ : ARL(τ) γ} for which the ARL exceeds the predefined positive number γ. The optimal changepoint detection procedure is described by the stopping time Let ν = arg inf SADD(τ). τ (γ) Z i (n) = log 1 (X i(n)) 0 (X i(n)) be the loglikelihood ratio (LLR) between the change and nochange hypotheses for the nth observation from the ith sensor and let I i = E 1 Z i (1) = log ( 1 (x) 0 (x) ) 1 (x)µ(dx) be the KullbackLeibler (KL) information number between the densities 1 (x) and f (i) 0 (x). The asymptotic performance of an optimal centralized detection procedure that has access to all data X n is given by (1) log γ inf SADD(τ) = (1 + o(1)), γ, (2) τ (γ) I tot where I tot = I i. See, e.g., [1, 9, 16, 22]. This performance is attained for the centralized CUSUM test that uses all available data. 2.2 Centralized CUSUM detection test The centralized CUSUM test is defined as τ c = min {n 1 : W c (n) h}, where the (centralized) CUSUM statistic W c (n) is given by the recursion { } N W c (n) = max 0, W c (n 1) + Z i (n) (3) (W c (0) = 0) and the threshold h is chosen so that ARL(τ c (h)) = γ. It is known [1, 9, 15, 16] that ARL(τ c (h)) e h and, hence, h = log γ guarantees ARL(τ c (h)) γ. The latter choice is usually conservative but useful for preliminary estimates and firstorder asymptotic analysis. Substantial improvements can be obtained using corrected Brownian motion approximations [15] and the renewal argument [18]. In the following two subsections, we consider two types of decentralized detection procedures that use compressed data U 1 (n),..., U N (n). These compressed data are transmitted to the fusion center for making the final decision. The compression level for both types of procedures is maximal the data U i (n) = 0 or 1, i.e., binary. Thus, for both proposed decentralized detection procedures the required bandwidth for communication with the fusion center is minimal. The advantage of the first detection test with binary quantized data is that it does not require any processing power at the sensors. In Section 2.4.2, even simpler voting local decisionbased detection tests are introduced.
3 2.3 Decentralized CUSUM test with binary quantization at the sensors Consider the scenario where based on observation X i (n) available at sensor S i at time n a message U i (n) belonging to a finite alphabet (e.g., binary) is formed and sent to the fusion center (see Figure 1). Write U n = (U 1 (n),..., U N (n)) for the vector of N messages at time n. Based on the sequence of sensor messages, a decision about the change is made at the fusion center. The goal is to find a detection test at the fusion center that has certain optimality properties. This test is identified with a stopping time on {U n } n 1 at which it is declared that a change has occurred. The corresponding problem has been considered by Tartakovsky and Veeravalli [22] in detail. In the following we consider the simplest case where U i (n) = ψ i (X i (n)) are the outputs of binary quantizers. It follows from [10, 22] that the asymptotically optimal policy for the decentralized change detection problem with binary quantization that minimizes SADD(τ) = sup k E k {τ k τ k), while maintaining the ARL(τ) at a level greater than γ, consists of a set of stationary monotone likelihood ratio quantizers (MLRQ) at the sensors followed by the CUSUM procedure based on {U n } n 1 at the fusion center. More specifically, the optimal binary quantizer is the MLRQ which is given by U i = ψ i (X) = 1 if 1 (X) t i, 0 (X) 0 otherwise, where t i is a positive finite threshold that maximizes the KL information in the resulting Bernoulli sequence for the postchange and prechange hypotheses. To be precise, for l = 0, 1, let g (i) denote the probability induced on U i (n) when the observation X i (n) is distributed as. Let β 0,i = g (i) 0 (U i(j) = 1) and β i = g (i) 1 (U i(j) = l 1) denote the corresponding probabilities under the normal and the anomalous conditions, respectively. The resulting binary (Bernoulli) sequences {U i (j), i = 1,..., N}, j 1 are then used to form the binary CUSUM statistic similar to (3) as W b (n) = max{0, W b (n 1) + where W b (0) = 0 and l N Zi b (n))}, (4) Z b i (n) = log g(i) 1 (U i(n))) g (i) 0 (U i(n)) is the partial LLR between the change and nochange hypotheses for the binary sequence, which is given by Here Z b i (n) = a i U i (n) + a 0,i. a i = log β i(1 β 0,i ) β 0,i (1 β i ), a 0,i = log 1 β i 1 β 0,i. Then the CUSUM detection procedure at the fusion center is given by the stopping time τ b (h) = min { n 1 : W b (n) h }, (5) where h is a positive threshold which is selected so that ARL(τ b (h)) γ. In what follows this detection procedure will be referred to as the binary quantized CUSUM test and the abbreviation BQCUSUM will be used throughout the paper. It follows from [22] that the BQCUSUM procedure with h = log γ is asymptotically optimal as γ in the class of tests with binary quantization in the sense of minimizing the SADD in the class (γ). More specifically, SADD(τ b ) = E 1 (τ b 1) and the tradeoff curve that relates SADD and ARL for the large ARL is SADD(τ b ) log(arl) (β ia i + a 0,i ). (6) Note that probabilities β i = β i (t i ) and β 0,i = β 0,i (t i ) depend on the value of threshold t i. To optimize the performance, one should choose thresholds t 1,..., t N so that the denominator in (6) is maximized, i.e., t 0 i = arg max t i>0 Ib i(t i ), i = 1,..., N, (7) where I b i (t i) = β i (t i )a i (t i ) + a 0,i (t i ) is the KL distance for the binary sequence in the ith sensor. It follows from (6) and (7) that the tradeoff curve for the optimal binary test is SADD(τ b ) log γ I b, γ, (8) tot where I b tot = max t i [β i (t i )a i (t i ) + a 0,i (t i )]. The asymptotic relative efficiency (ARE) of a detection procedure τ γ with respect to a detection procedure η γ, both of which meet the same lower bound γ for the ARL, will be defined as SADD(τ γ ) ARE(τ γ ; η γ ) = lim γ SADD(η γ ). Using (2) and (8), we obtain that the ARE of the globally asymptotically optimal test ν with respect to the BQ CUSUM test τ b is inf τ (γ) SADD(τ) ARE(ν; τ b ) = lim = Ib tot. (9) γ SADD(τ b (h γ )) I tot Since I tot is always larger than I b tot, the value of ARE < 1. However, our study presented below shows that certain decentralized asymptotically globally optimal tests may perform worse in practically interesting prelimit situations when the false alarm rate is moderately low but not very low. 2.4 Decentralized detection tests based on local decisions We now consider three detection schemes that perform local detection in the sensors and then transmit these local binary decisions to the fusion center for optimal combining and final decisionmaking. The abbreviation LDCUSUM will be used for procedures that perform CUSUM tests in sensors and use local decisions.
4 2.4.1 Asymptotically optimal decentralized LDCUSUM test Let W i (n) = max {0, W i (n 1) + Z i (n)}, W i (0) = 0 be the CUSUM statistic in the ith sensor, where, as before, Z i (n) = log[ 1 (X i(n))/ 0 (X i(n))] is the LLR for the original sequence. Let U i (n) = { 1 if W n (i) π i h 0 otherwise, where π i = I i /I tot and h is a positive threshold. The stopping time is defined as { } T ld (h) = min n : min [W i(n)/π i ] h. (10) 1 i N In other words, binary local decisions (1 or 0) are transmitted to the fusion center, and the change is declared at the first time when U i (n) = 1 for all sensors i = 1,..., N. It follows from Mei [10] that if E 1 Z 1 (i) 3 <, then E T ld (h) e h. Under an additional Cramértype condition, it follows from Dragalin et al [2] that SADD(T ld (h)) = h h + C N 1 + o(1), (11) I tot I tot where { } σi C N = E max Y i, (12) 1 i N I i Y 1,..., Y N are independent standard Gaussian random variables; σ i = Var i (Z 1 (i)); Var i is the operator of variance under 1. Therefore, if h = log γ, then inf SADD(τ) SADD(T ld(h)) log γ, τ (γ) I tot γ and the detection test T ld (h) is globally asymptotically optimal (AO), i.e., ARE(T ld ; τ c ) = 1. Correspondingly, we will use the abbreviation AOLDCUSUM for this test in the rest of the paper. However, since the second term in the asymptotic approximation (11) is on the order of the square root of the threshold, it is expected that the convergence to the optimum is slow. Note that for the optimal centralized CUSUM test and for the decentralized CUSUM test with binary quantization residual terms are constants. We therefore expect that for moderate false alarm rates typical for practical applications the procedure with quantization may perform better. This fact is confirmed by MC simulations for Gaussian models [10, 23]. In Section 4, this conjecture is verified for the Poisson model Decentralized minimal and maximal LDCUSUM tests Let τ i (h) = min{n : W i (n) h} denote the stopping time of the CUSUM test in the ith sensor. Introduce the stopping times T min (h) = min(τ 1,..., τ N ), T max (h) = max(τ 1,..., τ N ) that will be referred to as minimal LDCUSUM (MinLD CUSUM) and maximal LDCUSUM (MaxLDCUSUM) tests, respectively. Consider first the false alarm rate for these two detection tests. Clearly, E T max E τ i for every i = 1,..., N. Since E τ i e h, it follows that, for every h > 0, ARL(T max ) e h. It can be also shown that, for every h > 0, ARL(T min ) N 1 e h (cf. Tartakovsky [17]). These inequalities are usually very conservative. For large threshold values asymptotically sharp approximations can be derived as follows. It follows from [18] that, as h, under the nochange hypothesis the stopping times τ i, i = 1,..., N are exponentially distributed with mean values c i e h, where c i 1 are constants that can be computed numerically for any particular model using renewal argument. Therefore, for large threshold, T min (h) is approximately exponentially distributed with mean )] 1, while the mean of the stop where α N = [ ping time T max is ARL(T min ) α N e h, (c 1 i ARL(T max ) α Ne h as h, where α N > α N can be easily computed for any N. In particular, for N = 5 and in the symmetric case, α 5 = 137c/ c and α 5 = c/5. Also, it may be shown that, as h, SADD(T min ) h, SADD(T max ) h. max i I i min i I i Therefore, taking the thresholds h = log(γ/α N ) in the MinLDCUSUM and h = log(γ/α N ) in the MaxLD CUSUM, we obtain the tradeoff curves that relate the SADD and the ARL, as γ : SADD(T min ) log γ, SADD(T max ) log γ. max i I i min i I i It follows that in the symmetric case where I i = I the asymptotic relative efficiency of these detection tests compared to the optimal centralized test is ARE(T min ; τ c ) = ARE(T min ; τ c ) = N. Note that while based on the firstorder asymptotics it may be expected that in the symmetric case the MaxLD CUSUM test performs better for moderate FAR due to the fact that α N > α N, in reality it is difficult to make certain conclusions, since the second terms in the asymptotic decomposition may reverse this conclusion. Monte Carlo simulations in Section 4 show that the MinLDCUSUM test performs better even in the symmetric case.
5 3 Applications to Intrusion Detection in Distributed Computer Networks One of the important applications that stimulated the research in this paper is intrusion detection in distributed highspeed computer networks. A significant number of serious cyberattacks on a variety of governmental agencies, universities, and corporations have recently been identified [3, 4, 5, 6, 12, 14]. These attacks, including a variety of buffer overflows, wormbased, denialofservice (DoS) and maninthemiddle (MiM) attacks, are designed to gain access to additional hosts, steal sensitive data, and disrupt network services. As a result, rapid detection of a wide spectrum of network intrusions and robust separation of legitimate and malicious traffic are vital for the continuation of normal operation of networks. See Kent [6] and Tartakovsky et all [19][21] for a more detailed discussion. Typically network intrusions occur at unknown points in time and lead to changes in the statistical properties of certain observables. For example, distributed DoS (DDoS) attacks lead to changes in the mean value of the number of packets of a particular type (TCP, ICMP, or UDP) and size, while address resolution protocol (ARP) MiM attacks lead to changes in the average number of ARP requests [7],[19] [21]. It is therefore intuitively appealing to formulate the problem of detecting attacks as a quickest changepoint detection problem: to detect changes in statistical models as rapidly as possible (i.e., with minimal average delays) while maintaining the false alarm rate at a given low level. It follows from the results of the previous section that in the case of complete information about the prechange and the postchange models, (asymptotically) optimal detection procedures in multisensor detection systems can be constructed based on the LLRbased CUSUM tests. However, in intrusion detection applications, these models are unknown. For this reason, in [7],[19][21], a nonparametric approach was proposed and thoroughly tested for a singlesensor scenario. This approach can be easily extended to the multisensor centralized and decentralized scenarios. More specifically, when the prechange and postchange densities are unknown, the LLRs Z i (n) defined in (1) are also unknown and should be replaced by appropriate score functions s i (n) that have negative mean values E s i (n) < 0 before the change occurs and positive mean values E k s i (n) > 0 after the change occurs. While we do not specify any particular model in terms of probability distributions, some assumptions on the change should be made. Indeed, score functions can be chosen in many ways, and their selection depends crucially on the type of change that we intend to detect. For example, different score functions are used to detect changes in the mean and changes in the variance. In applications of interest, the detection problem can be usually reduced to detecting changes in mean values. Let µ i = E X i (j) and θ i = E 1 X i (j) denote the prechange and postchange mean values in the ith sensor. Typically, the baseline mean values µ i can be estimated quite accurately in advance while the values of θ i are usually unknown and either should be estimated online or replaced by reasonable numbers, e.g., by the expected minimal values. In the rest of this subsection we suppose for concreteness that θ i > µ i. For i = 1,..., N, introduce the following score functions s i (n) = X i (n) µ i c i, where in the general case c i = c i (n) may depend on past observations, which is desirable to guarantee an adaptive structure of the detection procedure. For example, one may take c i (n) = εˆθ i,n, where ε is a tuning parameter belonging to the interval (0, 1) and ˆθ i,n = ˆθ i,n (X n i ) is an estimate of the unknown mean θ i. Choosing the latter estimators as well as optimizing the parameter ε based on the training data are not straightforward tasks, as discussed in detail in Tartakovsky et al [19]. For this reason, it is convenient to set c i (n) = c i, where c i are positive constants that do not depend on n. Positiveness of c i is essential to guarantee the negative value of E s i (n) = c i under the nochange hypothesis. On the other hand, c i does not have to be too large in order to guarantee the positive value of E 1 s i (n) = θ i µ i c i under the alternative hypothesis. A particular choice of c i is discussed in [19]. If the above conditions hold, the scorebased CUSUM statistic in the ith sensor W s i (n) = max {0, W s i (n 1) + s i (n)} remains close to zero in normal conditions while when the change occurs it starts rapidly drifting upward (see Figure 2). The combined from all the sensors, centralized CUSUM statistic { } N W s (n) = max 0, W s (n 1) + s i (n) has a similar behavior. The time of alarm in the centralized detection scheme is defined as the first time n when the statistic W s (n) crosses a positive threshold. A binary quantized version of the CUSUM test can be designed analogously to Section 2.3. See [19] for further details. Finally, a nonparametric LDCUSUM test has the form (10) where the LLRbased CUSUM statistic W i (n) is replaced with the scorebased CUSUM statistic Wi s (n) and where π i = θ i µ i c i (θ i µ i c i ). For the sake of simplicity, we assume here that the postchange mean values θ i are known. Note that the above nonparametric detection algorithms are no longer guaranteed to be optimal. Certain optimization is possible based on the training data [19]. The behavior of the nonparametric local CUSUM statistics Wi s (n) and the corresponding binary counterparts is shown in Figure 2 for the ARP MiM attack. These plots have been obtained by simulating corresponding ARP MiM attacks and legitimate traffic in a network tesbed based on the University of Utah NetBed/Emulab. The network topology consisted of two subnets that contain a local detector whose output is utilized by the fusion center at the top level. During this attack, the attacker sends unrequested forged ARP replies to victim hosts, informing them falsely that the attacker is the destination of their connection with the
6 Figure 2: Nonparametric local and binary CUSUM detection statistics for a simulated ARP MiM attack. other host. The attacker can then filter, record, or arbitrarily modify the data before sending it to the true destination. The ARP MiM attack can be used for password and user name capture as well as for connection hijacking and realtime decryption if authentication certificates are not used for secure communications. Due to space limitations, a detailed study of the above centralized and decentralized nonparametric algorithms will be performed elsewhere. In the next section, we present the results of MC simulations of the LLRbased detection tests for a Poisson example. 4 Monte Carlo Experiments In this section, we present the results of MC experiments for the Poisson example where observations in the ith sensor X i (n), n 1 follow the common Poisson distribution P(µ i ) in the prechange mode and the common Poisson distribution P(θ i ) after the change occurs, i.e., for m = 0, 1, 2,... and λ = k, P k (X i (n) = m) = { (µi) m m! e µi for k > n, (θ i) m m! e θi for k n, where without loss of generality we assume that θ i > µ i. Write Q i = θ i /µ i. It is easily seen that the LLR statistic in the ith senor has the form Z n (i) = X i (n) log(q i ) µ i (Q i 1), (13) and the KL information numbers I i = θ i log Q i µ i (Q i 1), i = 1,..., N. (14) It follows from (2), (14) and the above discussion that the centralized CUSUM and AOLDCUSUM tests with the thresholds h = log γ are firstorder globally asymptotically optimal and inf SADD(τ) SADD(τ c) SADD(τ ld ) τ (γ) log γ [θ i log Q i µ i (Q i 1)]. (15) This means that the ARE of these detection tests with respect to the globally optimal test is equal to 1. In order to evaluate the ARE of an optimal test ν (e.g., the centralized CUSUM test τ c ) with respect to the BQ CUSUM test (5) we use (9), which yields ARE(ν; τ b ) = max t i [β i (t i )a i (t) + a 0,i (t i )] [θ, (16) i log Q i µ i (Q i 1)] where the probabilities β 0,i (t) and β i (t) are given by: β 0,i (t i ) = k= t i µ k i e µi, β i (t i ) = k! k= t i θi k e θi. k! Note that since likelihood ratios are monotone functions of X i (n), it is equivalent to quantize the observations. Here and in the following the thresholds t i are set in the space of observations rather than in likelihood ratio space. The optimal values of t 0 i that maximize the KL numbers (7) are easily found based on these formulas. Consider a symmetric case where µ i = 10 and θ i = 12 for all i = 1,..., N. Then I i = I = , the optimum threshold is t 0 i = 12, and the corresponding maximum KL distance for the binary sequence I b i (t0 i ) = Ib = Therefore, the loss in efficiency of the BQtest compared to the globally asymptotically optimal detection procedure is ARE(ν; τ b ) = 0.119/ = 0.63, i.e., for the large ARL we expect about 37% increase in the average detection delay compared to the centralized CUSUM (CCUSUM). The following MC simulations show that for the practically interesting values of the ARL (up to 13, 360) the gain of the optimal CCUSUM test is even smaller, while the AOLD CUSUM test performs worse than the BQCUSUM test due to the reasons discussed in Section 2.4. MC simulations have been performed for the above symmetric situation (i.e., µ i = µ = 10 and θ i = θ = 12) with N = 5 sensors. We used 10 5 MC replications in the experiment. The operating characteristics of the five detection tests (SADD vs log(arl)) are shown in Figure 3 and Table 1. It is seen that the BQCUSUM test substantially outperforms the AOLDCUSUM test for all false alarm rate range used in simulations. This result confirms our conjecture. It is also seen that both MinLDCUSUM and MaxLDCUSUM perform worse than both BQCUSUM and AOLDCUSUM tests. Table 2 shows the relative efficiency of the BQCUSUM procedure with respect to four other detection procedures,
7 Table 1: Operating Characteristics of Detection Procedures log(arl) ARL SADD for CCUSUM SADD for AOLDCUSUM SADD for MinLDCUSUM SADD for MaxLDCUSUM SADD for BQCUSUM Table 2: Relative Efficiency of the Decentralized BQCUSUM Test log(arl) ARL Test Relative Efficiency of the Decentralized BQCUSUM Test CCUSUM AOLDCUSUM MinLDCUSUM MaxLDCUSUM which is defined as the ratio of average detection delays for the same ARL: SADD(τ b )/SADD(ν), where ν is a corresponding detection test, i.e., ν = τ c, T ld, etc. It follows from the table that for the BQCUSUM the increase in the SADD compared to the globally optimal centralized CUSUM is 34% for high false alarm rate, 35% for moderate and low false alarm rate, and 37% for very low false alarm rate. Note that the last column presents the ARE. On the other hand, the BQCUSUM outperforms the AO LDCUSUM for all range of tested ARL values, from 33 to 13,360. The gain is 30% for high false alarm rate and slowly reduces to 18% for low false alarm rate. Figure 3: Operating characteristics of detection procedures. 5 Discussion and Conclusions The presented results allow us to compare performance of four proposed decentralized change detection procedures, as well as to determine loss in efficiency compared to the globally optimal centralized scheme. The first detection test, called the BQCUSUM test, uses binary quantizers at the sensors followed by the CUSUM detection procedure at the fusion sensor. The second detection test, called the AOLDCUSUM test, performs local detection at the sensors using CUSUM tests, and at each sampling point transmits these local decisions to the fusion center for combining and making the final decision. Both decentralized detection procedures transmit only binary sequences of 1 s and 0 s to the fusion center. Therefore, both detection tests use maximal possible level of data compression and require minimum bandwidth for communication. The third and fourth decentralized detection procedures, called the minimal and maximal LDCUSUM tests respectively, are based on independent voting of sensors. In the former one the decision is made at the first time when the first CUSUM test detects the change; while in the latter one when all the sensors detect the change (but independently, not like in the AOLD CUSUM). Due to losses of information, the BQCUSUM test is inferior to the globally optimal centralized CUSUM test. On the other hand, the AOLDCUSUM test is firstorder asymptotically globally optimal for low false alarm rate. However, convergence to the optimum is expected to be slow, since the second term in the decomposition for the average detection delay goes to infinity as the square root of the threshold. We therefore conjectured that despite the fact that the AOLDCUSUM test is firstorder asymptotically optimal it may perform worse than the nonoptimal BQCUSUM test in a realistic environment. The results of MC simulations for the Poisson model confirm this latter hypothesis. For the model considered the BQCUSUM outperforms the LDCUSUM for all range of tested ARLs, from 33 to 13,360. The increase in the SADD is 30% for high false alarm rate and it slowly reduces to 18% for low false alarm rate. While potentially the ARE of the AO LDCUSUM test compared to the BQCUSUM test is 37%, this performance never kicks in for realistic moderately low false alarm rate. The voting MinLDCUSUM and MaxLDCUSUM tests are neither asymptotically optimal nor very efficient. Both tests are inferior to AOLDCUSUM and BQ
8 CUSUM tests. The MinLDCUSUM test is superior to the MaxLDCUSUM test in the symmetric case, and it is expected to perform even better in asymmetric scenarios. The additional advantage of the BQCUSUM test compared to all other decentralized LDCUSUM tests is that it does not require any processing power at the sensors. While the considered Poisson model is motivated by network security applications such as rapid detection of computer intrusions, in reality it never holds and therefore efficient nonparametric detection procedures are needed. Suitable procedures are briefly discussed in Section 3. Their comprehensive study (theoretical, MC simulations, and implementation for real data sets) for multisensor distributed systems is important. We left this study for future work. Acknowledgement The work of Alexander Tartakovsky was supported in part by the U.S. Office of Naval Research grant N at the University of Southern California and by the U.S. ARMY SBIR contract W911QX04C0001 at AD SANTEC. The research of Hongjoong Kim was supported by the MIC under the ITRC support program supervised by the IITA. References [1] M. Basseville and I.V. Nikiforov, Detection of Abrupt Changes: Theory and Applications. Prentice Hall, Englewood Cliffs, [2] V. Dragalin, A. Tartakovsky, and V. Veeravalli, Multihypothesis sequential probability ratio tests, part 2: accurate asymptotic expansions for the expected sample size, IEEE Trans. Inform. Theory Vol. 46, No. 4, pp , [3] L. Garber, Denialofservice attacks rip the Internet, Computer, April [4] S. Gibson, Distributed reflection denial of service: description and analysis of a potent, increasingly prevalent, and worrisome Internet attack, Gibson Research Corporation, [5] A. Hussain, J. Heidemann, and C. Papadopoulos, A framework for classifying denial of service attacks, Proc. Sigcomm 2003, Karlsruhe, Germany, [6] S. Kent, On the trial of intrusions into information systems, IEEE Spectrum, Vol. 37, Issue 12, pp , [7] H. Kim, B. Rozovskii, and A. Tartakovsky, A nonparametric multichart CUSUM test for rapid detection of DOS attacks in computer networks, Internat. J. Computing and Information Sciences, Vol. 2, No. 3, pp , [8] T.L. Lai, Sequential changepoint detection in quality control and dynamical systems, J. R. Statist. Soc. B, Vol. 57, No. 4, pp , [9] G. Lorden, Procedures for reacting to a change in distribution, Ann. Math. Statist., Vol. 42, pp , [10] Y. Mei, Information bounds and quickest change detection in decentralized decision systems, IEEE Trans. Inform. Theory, Vol. 51, pp , [11] G.V. Moustakides, Optimal stopping times for detecting changes in distributions, Ann. Statist., Vol. 14, pp , [12] V. Paxson, Bro: A System for detecting network intruders in realtime, Computer Networks, Vol. 31(2324), pp , [13] M. Pollak, Optimal detection of a change in distribution, Ann. Statist., Vol. 13, pp , [14] M. Roesch, Snort: Lightweight intrusion detection for networks, Proc. 13th Systems Administration Conference (LISA), pp , [15] D. Siegmund, Sequential Analysis: Tests and Confidence Intervals. SpringerVerlag, New York, [16] A.G. Tartakovsky, Sequential Methods in the Theory of Information Systems. Radio i Svyaz, Moscow, 1991 (In Russian). [17] A.G. Tartakovsky, Asymptotically minimax multialternative sequential rule for disorder detection, In: Statistics and Control of Random Processes: Proc. Steklov Institute of Mathematics, vol. 202, Issue 4, pp , AMS, Providence, RI. [18] A.G. Tartakovsky, Asymptotic performance of a multichart CUSUM test under false alarm probability constraint, Proc. 44th IEEE Conf. on Decision and Control and the European Control Conf. (CDCECC 05), December 1215, 2005, pp , Seville, Spain, Omnipress CDROM, ISBN [19] A.G. Tartakovsky, B.L. Rozovskii, R. Blažek, and H. Kim, Detection of intrusions in information systems by sequential changepoint methods, Statistical Methodology, 2006 (to appear). [20] A. Tartakovsky, B. Rozovskii, R. Blažek, and H. Kim, A novel approach to detection of intrusions in computer networks via adaptive sequential and batchsequential changepoint detection methods, IEEE Trans. Signal Processing, 2006 (to appear). [21] A.G. Tartakovsky, K. Shah, and B.L. Rozovskii, A nonparametric multichart CUSUM test for rapid intrusion detection, Proc. JSM, Minneapolis, MN, 711 August 2005 (CD Rom). [22] A.G. Tartakovsky and V.V. Veeravalli, An efficient sequential procedure for detecting changes in multichannel and distributed systems, Proc. 5th Intern. Conf. on Information Fusion, Annapolis, MD, 811 July 2002, Vol. 1, pp [23] A.G. Tartakovsky and V.V. Veeravalli, Quickest change detection in distributed sensor systems, Proc. 6th Intern. Conf. on Information Fusion, Cairns, Australia, MD, 810 July 2003, pp [24] A.G. Tartakovsky and V. Veeravalli, Changepoint detection in multichannel and distributed systems with applications, in: N. Mukhopadhyay, S. Datta and S. Chattopadhyay, eds., Applications of Sequential Methodologies, Marcel Dekker, Inc., NY, 2004, pp [25] A.G. Tartakovsky and V. Veeravalli, General asymptotic Bayesian theory of quickest change detection, Theory Prob. Appl. Vol. 49, No. 3, pp , 2004.