SCALABLE ROBUST MONITORING OF LARGE-SCALE DATA STREAMS. By Ruizhi Zhang and Yajun Mei Georgia Institute of Technology

Size: px
Start display at page:

Download "SCALABLE ROBUST MONITORING OF LARGE-SCALE DATA STREAMS. By Ruizhi Zhang and Yajun Mei Georgia Institute of Technology"

Transcription

1 Submitted to the Annals of Statistics SCALABLE ROBUST MONITORING OF LARGE-SCALE DATA STREAMS By Ruizhi Zhang and Yajun Mei Georgia Institute of Technology Online monitoring large-scale data streams has many important applications such as industrial quality control, signal detection, biosurveillance, but unfortunately it is highly non-trivial to develop scalable schemes that are able to tackle two issues of robustness concerns: (1) the unknown sparse number or subset of affected data streams and (2) the uncertainty of model specification for high-dimensional data. In this article, we develop a family of scalable robust schemes in the scenario when the local data streams are from Tukey-Huber s gross error models with outliers. We first define a new local detection statistic called L -CUSUM statistic that can reduce the effect of outliers by using the Box-Cox transformation of the likelihood function. Then we propose to raise a global alarm based upon the sum of the soft-thresholding transformation of these local L -CUSUM statistics so as to filter out unaffected data streams. In addition, we propose a new concept of false alarm breakdown point to measure the robustness of schemes with respect to outliers, and also characterize the breakdown point of our proposed schemes. Asymptotic analysis and extensive numerical simulations are conducted to illustrate the robustness and usefulness of our proposed schemes. 1. Introduction. Robust statistics have been extensively studied in the offline context when the full data set is available for analysis with contaminated outliers, e.g., robust estimation (Huber, 1964; Basu et al., 1998), robust hypothesis testing (Huber, 1965; Heritier and Ronchetti, 1994), and robust regression (Yohai, 1987; Cantoni and Ronchetti, 2001). Also see the classical books, Huber and Ronchetti (2009) or Hampel et al. (2011), for literature review. Here we propose scalable robust methods in the online context of monitoring large-scale data streams. Our research is motivated by real-world applications in industrial quality control, biosurveilance, key infrastructure or internet traffic monitoring, in which sensors are deployed to constantly monitor the changing environment, see Shmueli and Burkom (2010); Tartakovsky, Polunchenko and Sokolov (2013); Yan, Paynabar and Shi (2015). One would like to detect an undesirable event as quickly as possible by monitoring large-scale data streams generated from these sensors, but there are two issues of robustness concerns here. The first one is that only a sparse number of data streams might be affected by the event, but we do not know which subsets of data streams are affected or the exact Keywords and phrases: Change-point, CUSUM, robustness, quickest detection, scalable, sparsity. 1

2 2 Fig 1. A local data stream with outliers when a change in distribution occurs at time ν = 50. number of affected data streams. Hence we want to effectively detect the changing event regardless of different combinations of affected data streams. Xie and Siegmund (2013) was the first one to tackle this sparsity/robustness issue by semi-bayesian approach, and later Wang and Mei (2015) developed shrinkage-estimation-based schemes. Chan (2017) developed asymptotic optimality theory for largescale independent Gaussian data streams. Unfortunately all these research is based on specific parametric models (e.g.. Gaussian) for the observations which may easily be violated in practice. Moreover, these existing methods are computationally expensive and not scalable for monitoring large-scale data streams. The second, and possibly more serious, concern is that local data streams might involve random outliers under the normal state that do not indicate the changing event. One specific example is vehicle rush hour traffic monitoring, where one would like to decide whether or not to add new lanes or build new roads due to larger population or constructions of new stadium or shopping mall. However, the observed traffic data might involve the outliers that are caused by other events such as major car accidents or severe weather, which are related but probably should not be the decisive factors in the decision making. Another example is the detection of Distributed denial-ofservice (DDoS) in cyber-security by monitoring the numbers of attempted connections. When the observed number of connections suddenly becomes huge, it might be due to the normal fluctuations and does not necessarily imply the DDoS attack, especially if it goes back to normal state shortly, see Tartakovsky, Polunchenko and Sokolov (2013). For better illustration, Figure 1 plots a sequence of simulated one-dimensional observations whose distributions changes from N(0, 1) to N(1, 1) at time 50 with some contaminated outliers. These outliers often cause standard statistical methods

3 SCALABLE ROBUST ONLINE MONITORING 3 to raise local false alarms, and thus the system-wise false alarm can be huge given the large number of sensors or data streams. Indeed, too frequent system-wise false alarms are the main reason why the usefulness of bio and syndromic surveillance via a huge sensor grid (e.g., hospitals, state/conty surveillance systems) throughout the U.S., is in debate, see Stoto, Schonlau and Mariano (2004). In this paper, we develop scalable robust methods to tackle the above-mentioned two robustness issues when online monitoring large-scale data streams. Here we adopt the classical offline robust statistics approach: we have a parametric, idealized model that might be a good approximation to the true model, but we cannot and do not assume that the assumed model is exactly correct, see Huber and Ronchetti (2009). It is desirable to develop a statistical method that has a reasonably good efficiency at the assumed model, and should still have good performances if there are small deviations from the assumed model. In particular, under our online monitoring context, the assumed model specifies the number of affected local data streams as well as the local distributions of local data streams. Our proposed methods are robust in the sense that small deviations from the number of affected local data streams or the local distributions should not impair the performance too much. From the modeling viewpoint, we assume that the true model for each local data is Tukey-Huber s gross error model (Tukey, 1962; Huber, 1964), which is a two-component mixture model with one component being the assumed, idealized model for signals and the other component for outliers. The occurring event changes the signal component of Tukey-Huber s gross error model for only a few affected local data streams. From the methodology viewpoint, our proposed schemes tackle these two robustness issues by combining the newly developed online robust idea of sum-shrinkage of local detection statistics in Liu, Zhang and Mei (2017) with the contemporary offline robust concept of L q likelihood in Ferrari and Yang (2010); Qin and Priebe (2017). We should acknowledge that online monitoring of one-dimensional or low-dimensional multivariate data streams has been well studied in the sequential change-point detection literature. Many classical procedures have been developed under the parametric models such as Page s CUSUM procedure (Page, 1954), Shiryaev-Roberts procedure (Shiryaev, 1963; Roberts, 1966), window-limited procedures (Lai, 1995) and scan statistics (Glaz et al., 2001), and some classical, fundamental research were established in Shiryaev (1963), Lorden (1971), Pollak (1985, 1987), Moustakides (1986), Ritov (1990), Lai (1995), etc. For a review, see the books such as Basseville and Nikiforov (1993), Poor and Hadjiliadis (2009), Tartakovsky, Nikiforov and Basseville (2014). There are some limited research on robust monitoring one-dimensional data, and most of them were nonparametric

4 4 methods such as rank-based method in Gordon and Pollak (1994, 1995), or kernel-based method in Desobry, Davy and Doncarli (2005). However, these nonparametric methodologies generally lose efficiency under the specific parametric or semi-parametric models. An exception is Unnikrishnan, Veeravalli and Meyn (2011), which reduces the problem of robust monitoring one-dimensional data to the problem of detecting a change in the least favorable distribution that heavily depends on the proportion of outliers. Unfortunately it is unclear how to extend the concept of least favorable distribution to the context of large-scale data streams when there is uncertainty on the subset of affected local data streams. In addition, we should also mention that in a completely different context, Chen and Zhang (2015) proposes a nonparametric, graph-based method for monitoring large-scale data streams. Their research was motivated by the change of friendship network structure over time, and focused on detecting the global change in the correlation structures between data streams. While our motivating examples are biosurveillance, traffic and security monitoring, where we are interested in detecting local changes for sparse affected local data streams among a large number of data streams. Intuitively it is more challenging to detect sparse local changes than global changes across all data streams. Our research makes several contributions in the statistics field by combining robust statistics with sequential change-point detection for online monitoring large-scale data streams. First, our proposed method is robust with respect to the uncertainty of the number or subset of affected data streams as well as the uncertainty of the model assumptions of local data. Second, our proposed method is scalable and computationally simple, as its recursive form allows one to easily implement it over long time period for large-scale data streams via parallel computing at each local data stream with fixed memory requirements. Third, inspired by the concept of breakdown point (Hampel, 1968) in the offline robust statistics and the excessive false alarms in practice, we propose a novel concept of false alarm breakdown point to measure the robustness of any schemes, and showed that our proposed scheme is indeed has much larger false alarm breakdown point than the classical CUSUMbased schemes. Finally, from the mathematical viewpoint, we use Chebyshev s inequality, not the standard renewal theory, to derive non-asymptotic low bounds on the average run of length of false alarm for our proposed method. The non-asymptotic results hold regardless of the number of data streams, and allow us to provide a deep insight of monitoring large-scale data streams in the modern asymptotic regime when the number of data streams goes to. The remainder of this article is organized as follows. In Section 2, we present preliminaries and

5 SCALABLE ROBUST ONLINE MONITORING 5 background information of quickest change detection or sequential change-point detection. Then we develop our proposed schemes for online robust monitoring large-scale data streams in Section 3. The theoretical properties of our proposed schemes are provided in Section 4. In Section 5, we introduce the concept of false alarm breakdown point and characterize the breakdown point of our proposed schemes. The simulation results are presented in Section 6 and the proofs of our main theorems are postponed to Section Preliminaries and background. Suppose we are monitoring K local data streams over time, and denote by X k,n the local observation at the k-th local data stream at time n. (1) Data Stream 1 : X 1,1, X 1,2, Data Stream 2 : X 2,1, X 2,2, Data Stream K : X K,1, X K,2,. Initially, the system is in control, but at some unknown time ν, an undesired event may occur and affect some unknown data streams in the sense of changing local distributions of some, but not all, local data streams X k,n s. Equivalently, we are monitoring K-dimensional random vectors, X n = (X 1,n,, X K,n ), over time n, and the occurring event affects some, but not necessarily all, components of the X n s. We would like to utilize the observed data X k,n s to raise an alarm as quickly as possible once the true change occurs, subject to the false alarm constraint. To highlight our main ideas, we make a simplified assumption that the X k,n s are independent and identically distributed (i.i.d.) over time and across different data streams. Here the X k,n s might be one-dimensional or low-dimensional raw data, or derived features or residuals from some spatialtemporal models. Note that this independence assumption is not as restrict as one thought in many practical applications. For instance, one can first build some baseline spatio-temporal models, and then monitor independent residuals instead of dependent raw data, see Xie, Huang and Willett (2013) and Liu, Mei and Shi (2015) for two real-world applications in solar flare and hot-forming process. Another possibility is monitoring independent features, see Paynabar, Zou and Qiu (2016), Wang, Paynabar and Mei (2017) to use principal component analysis (PCA) to extract independent coefficient features of multi-channel profiles, and then monitors the independent PCA coefficients instead of raw profile data. For the purpose of a more rigorous presentation, let us begin with the local models, or the local

6 6 pre- and post- change distributions, for local data streams. At the high level, a good approximation of local models is the classical change-point model of detecting a change in local distribution from one known density function f 0 ( ) to another known density function f 1 ( ) at some unknown time ν, see Lorden (1971). Below we will refer this as the idealized model. As mentioned in the Introduction, due to the outliers, it often makes more sense to assume that these two given distributions, f 0 (x) and f 1 (x), capture most, but not all, information of the data. This motivated us to follow the classical offline robust statistics to define the true model of local observations as Tukey-Huber s models. Specifically, we assume that the data X k,n s are i.i.d. with density h 0 (x) when n ν 1, but i.i.d. with density h 1 (x) when n ν, where the h 0 s and h 1 s are from Tukey-Huber s models of the two-component mixture densities (2) h 0 (x) = (1 ɛ)f 0 (x) + ɛg(x), and h 1 (x) = (1 ɛ)f 1 (x) + ɛg(x). Here ɛ [0, 1) is the contamination ratio, and g(x) is the contamination density, which is usually assumed to be unknown except with a fat tail. Below the model in (2) will be referred as the gross error model, and clearly it becomes the idealized model when ɛ = 0. Intuitively, under the gross error model in (2), most of the data X k,n s are from the idealized pre-change or post-change distributions f 0 (x) or f 1 (x), but a small proportion of observations are contaminated and have another unknown density g(x). Alternatively, the contamination distribution g can also be considered as another intrinsic post-change distribution but we are not interested in detecting a change in density from f 0 (x) to g(x), and only interested in detecting the change from f 0 (x) to f 1 (x). Also note that we consider a simplified setup with the same contamination ratio ɛ and the same contamination density function g(x) under the pre-change and post-change distributions. However, we should emphasize that our proposed methods can be easily extended to more general cases when ɛ and g can be different between h 0 and h 1, since our schemes only utilize the knowledge of f 0 (x) and f 1 (x), but not ɛ or g(x) (though the asymptotic or optimality properties will depend on ɛ and g). Under the hypothesis of no changes, the data X k,n s are i.i.d. with density h 0, and denote the corresponding probability measure and expectation as P ( ) ɛ and E ( ) ɛ. Here and below the subscript ɛ is used to highlight the proportion ɛ of outliers in the gross error model. Under the alternative hypothesis that a change occurs at time ν, m out of K data streams are affected, and for those affected m local data streams, the observations X k,n s are i.i.d. with density h 1 when

7 SCALABLE ROBUST ONLINE MONITORING 7 n ν, whereas the observations from unaffected data streams are still i.i.d. with density h 0. The probability measure and expectation in this case are denoted by P (ν) ɛ and E (ν) ɛ. Next, we present the mathematical formulation of our online monitoring problems under the standard minimax formulation for sequential change-point detection (Lorden, 1971). In our context, the statistical procedure is defined as a stopping time T which represents the time when we raise an alarm to declare that a change has occurred. Here T is an integer-valued random variable, and the event {T = t} is based only on the observations in the first t time steps. Under the standard minimax formulation in Lorden (1971), when the number m of affected data streams, the contamination ratio ɛ and contamination distribution g are known, one would like to find a stopping time T that asymptotically minimizes the detection delay ) D ɛ (T ) = ess sup E (ν) ɛ ((T ν + 1) + Fν 1 (3) sup ν 1 for each and every combination of m affected local data streams, subject to the false alarm constraint (4) E ( ) ɛ (T ) γ for some pre-specified large constant γ > 0. Here F ν 1 = (X 1,[1,ν 1],..., X K,[1,ν 1] ) denotes past global information at time ν, X k,[1,ν 1] = (X k,1,..., X k,ν 1 ) is past local information for the k-th data stream. When m, ɛ or g is unknown, ideally the false alarm constraint in (4) and the detection delay minimization in (3) hold uniformly for all possible ɛ and g. Since this is clearly impossible, we will investigate the asymptotic properties of our schemes for given m, ɛ and g, and then compare the efficiency and robustness with other procedures through asymptotic analysis and numerical simulations under various hypothetical conditions, especially with different m and ɛ. Finally, let us review the classical Cumulative Sum (CUSUM) procedure in Page (1954) for monitoring local data streams under the idealized model with ɛ = 0, and discuss the challenges of extension to the gross error model. When the local distribution of the X k,n s may change from f 0 to f 1 at some unknown time ν, the problem of online monitoring local data streams can be formulated as repeatedly testing the null hypothesis H 0 : ν = (i.e., no change) against the composite alternative hypothesis H 1 : 1 ν < (i.e., a change occurs at some finite time) at each and every time step n. Consider the observed data X k,1,, X k,n at the k-th local stream at time n, its joint density function when the change occurs at ν becomes n i=1 f ν (X k,1,, X k,n ) = f 0(X k,i ), if ν = or if n + 1 ν < ; ν 1 i=1 f 0(X k,i ) n i=ν f 1(X k,i ), if 1 ν n.

8 8 Then for the k-th local data stream, the logarithm of the generalized-likelihood ratio (GLR) statistic at time n is defined as (5) W k,n = max 1 ν< log f ν(x k,1,, X k,n ) f ν= (X k,1,, X k,n ) = max { max 1 ν n which can be computed recursively as ( Wk,n = max Wk,n 1 + log f ) 1(X k,n ) (6) f 0 (X k,n ), 0 for n 1 with the initial value W k,0 the W k,n n i=ν } log f 1(X k,i ) f 0 (X k,i ), 0, = 0. In the sequential change-point detection literature, in (5) or (6) is referred as the CUSUM statistic, and the classical CUSUM procedure raises a local alarm at the first time n when the CUSUM statistic Wk,n in (5) or (6) exceeds some pre-specified constant. It is not surprising that as the GLR, the CUSUM statistic Wk,n in (5) or (6) yields a statistically efficient procedure for monitoring local data streams under the idealized model with ɛ = 0, see Lorden (1971); Moustakides (1986); Ritov (1990), but its statistical efficiency degrades significantly in the presence of even mild outliers for monitoring local data streams under the gross error model in (2). While one may still apply the GLR principle directly to the gross error model theoretically by maximizing over the uncertainty of the contamination ratio ɛ or contamination distribution g, the corresponding local GLR statistic no longer has a recursive form, and thus the corresponding GLR procedure lose computationally efficiency. Moreover, since the subset of affected data streams are unknown, the GLR principle would search over all possible combinations of local affected data streams, which can be huge for large-scale data streams. Hence, we want to develop alternative schemes that are efficient and scalable for monitoring large-scale data streams and can better balance the tradeoff between statistical efficiency and computational efficiency. 3. Our proposed methodology. At the high-level, our proposed scalable scheme monitors each local data stream individually in parallel and then combines local detection statistics together to raise a global alarm. For the purpose of easy understanding, we split into two subsections. Subsection 3.1 discusses how to construct robust local detection statistics in the presence of outliers by the contemporary offline robust concept of L q likelihood in Ferrari and Yang (2010); Qin and Priebe (2017), and Subsection 3.2 presents how to use the sum-shrinkage technique in Liu, Zhang and Mei (2017) to combine local decisions together to raise a global alarm under the uncertainty of affected local data streams.

9 SCALABLE ROBUST ONLINE MONITORING Construction of local L -CUSUM statistics. In this subsection we apply the offline robust concept in Ferrari and Yang (2010), Qin and Priebe (2017) to the online monitoring context to construct robust local detection statistics under the gross error model in (2). Recall that in the offline setting when one has full dataset available for analysis, one naive approach to deal with outliers is to first detect and remove outliers, and then conduct data analysis on the remaining data. However, such naive approach is generally inefficient under the online setting of monitoring large-scale streams, since some abnormal observations might just be observations that are trying to tell us that a change has occurred, and removing them as outliers will prevent one to detect the true change quickly. Here we follow the offline robust statistics literature to keep all observations but to de-emphasize the role of abnormal observations for online monitoring. To be more concrete, in the offline statistics, the log-likelihood function n i=1 log f θ(x i ) or the log-likelihood ratio test statistic {sup θ1 Θ 1 n i=1 log f θ 1 (X i ) sup θ0 Θ 0 n i=1 log f θ 0 (X i )} plays important roles for point estimation or hypothesis testing under the idealized model, but their properties will degrade significantly under the gross error model in (2). To better balance the tradeoff between efficiency and robustness, Ferrari and Yang (2010) proposes a robust point estimator by maximizing n i=1 [f θ(x i ) 1]/ and Qin and Priebe (2017) proposes a robust hypothesis testing statistic by essentially considering {sup θ1 Θ 1 n i=1 [f θ 1 (X i ) 1]/ sup θ0 Θ 0 n i=1 [f θ 0 (X i ) 1]/} before bias correction. Note that the power transformation of [u 1]/ is know as the Box-Cox transformation in the data transformation context that transforms a raw data into a normally distributed data, see Box and Cox (1964). Here the power transformation is applied to the likelihood function f(x) domain, instead of the raw data domain, in the robust context so as to de-emphasize the role of abnormal observations. A high-level intuition is as follows. On the one hand, for those typical observations X i s, the value of f(x i ) should be moderate, and thus [f(x i ) 1]/ log f(x i ) as 0, e.g., the statistical efficiency of the likelihood functions can be maintained. On the other hand, for those outlier data X i s, the values of the likelihood function f(x i ) can be very small. Thus the loglikelihood log f(x i ) might go to, but for a given > 0, the value of [f(x i ) 1]/ is bounded below by 1/. Thus the effect of these outliers can be severe for the log-likelihood function, but is controlled under the power transformation. With a suitable choice of, the power transformation [f(x) 1]/ can reach a good balance between statistical efficiency and robustness. Below we extend the above robust concept to the online monitoring context to construct local

10 10 L -CUSUM statistics that is robust for gross error model. To be more specific, based on the classical CUSUM statistics in (5) and (6) for the idealized model, we propose to replace the log-likelihood ratio log f 1 (X k,n ) log f 0 (X k,n ) by [f 1 (X k,n ) 1]/ [f 0 (X k,n ) 1]/ = [f 1 (X k,n ) f 0 (X k,n ) ]/ for the gross error model for some > 0. That is, for the k th local data stream, we define the local L -CUSUM statistics by the following recursive formula over time n : (7) W,k,n = max ( W,k,n 1 + [f 1(X k,n )] [f 0 (X k,n )], 0 ), for n 1, and W,k,0 = 0. Here 0 is a tuning parameter that can control the tradeoff between statistical efficiency and robustness under the gross error model in (2). It is interesting to note that as 0, our proposed local L -CUSUM statistics W,k,n in (7) will converges to the classical CUSUM statistic Wk,n in (5) or (6), and the choice of will be discussed later in subsection Our proposed global monitoring scheme. When online monitoring large-scale data streams under the idealized model, a standard approach is to apply shrinkage to the post-change parameter estimation in order to deal with the uncertainty of sparse affected local data streams, see Xie and Siegmund (2013); Wang and Mei (2015); Chan (2017), but unfortunately such approaches are often computationally expensive. Recently a scalable approach is proposed in Liu, Zhang and Mei (2017) when the number m of affected local data streams is known. The key idea is to apply shrinkage to local detection statistics, not local post-change parameters, since it can also filter out as many unaffected data streams as possible, e.g., the order-thresholding transformation that only keep the largest m local detection statistics. In this paper, we have an additional challenge on the uncertainty of the number of affected local data streams. Fortunately the soft-thresholding transformation turns out to be effective, since it not only filters out those unaffected streams, but also keeps any local data streams that might provide information about the changing event. Mathematically, our proposed global monitoring scheme is defined as the stopping time N (b, d) that raises a global alarm at the first time { } K (8) N (b, d) = inf n 1 : max{0, W,k,n d} b, k=1 where W,k,n is the local L -CUSUM statistic in (7), the constant d is the tuning parameter to filter out those unaffected data streams, and the control limit b > 0 is chosen to satisfy the false alarm constraint in (4).

11 SCALABLE ROBUST ONLINE MONITORING 11 We should mention that besides the soft-thresholding transformation, there are other approaches to combine the local detection statistics together to make a global alarm. Two popular approaches in the literature are the MAX and the SUM schemes, see Tartakovsky and Veeravalli (2008) and Mei (2010): (9) (10) N,max (b) = inf N,sum (b) = inf { } n 1 : max W,k,n b, 1 k K { } K n 1 : W,k,n b. Unfortunately, the MAX and SUM approaches are generally statistically inefficient unless in k=1 extreme cases of very few or many affected local data streams. For the purpose of fair comparison, besides those methods in Chan (2017) for Gaussian data under the idealized model, we also consider several other comparison methods to better illustrate the advantages of our proposed global monitoring scheme in (8). Regarding the robustness in the presences of local outliers, we compare our proposed local L -CUSUM statistic in (7) with the classical CUSUM statistic W k,n in (5) or (6). In other words, the baseline scheme is the special case of our proposed scheme N =0 (b, d) in (8) when = 0, which is based on the soft-thresholding transformation of local CUSUM statistics. On the other hand, regarding the robustness of the number of affected data streams, our proposed scheme N (b, d) in (8) will be compared to the MAX and SUM schemes, N,max (b) and N,sum (b) in (9) with the same parameter. 4. Theoretical Properties. In this section, we investigate the performance properties of our proposed scheme N (b, d) in (8) under the gross error model in (2), and we pay special attention to the dimension effect of the number K of data streams as K. For that purpose, it is necessary to introduce two technical assumptions on the variable Y = ([f 1 (X)] [f 0 (X)] )/ when X is distributed according to h 0 or h 1 under the gross error model in (2). Note when = 0, the variable Y should be treated as log(f 1 (X)/f 0 (X)). The first one is in parallel to the Kullback-Leibler information of the idealize model. Assumption 4.1. Given ɛ 0 and 0, assume [ ] [f 1 (X)] [f 0 (X)] (11) I 1 (ɛ, ) = E h1 [ ] [ ] [f 1 (X)] [f 0 (X)] [f 1 (X)] [f 0 (X)] = (1 ɛ)e f1 + ɛe g

12 12 is positive, where E h1, E f1 and E g denote the expectations when the density function of X is h 1, f 1 and g, respectively. We should emphasize that this assumption is rather wild for small ɛ, > 0. For instance, when ɛ = = 0, I 1 (ɛ = 0, = 0) becomes the well-known Kullback-Leibler information number (12) I(f 1, f 0 ) = I 1 (0, 0) = E f1 log(f 1 (X)/f 0 (X)), which is always positive unless f 0 = f 1. Since all functions are continuous with respect to and ɛ, it is reasonable to assume that I 1 (ɛ, ) are also positive for small ɛ, > 0. The second assumption involves some probability backgrounds. For a random variable Y with pdf s(y), assume that the moment generating function ϕ(λ) = E(e λy ) = e λy s(y)dy is well-defined. Then ϕ(λ) is a convex function of λ with ϕ(0) = 1, and there often exists another non-zero constant λ such that ϕ(λ ) = 1, see Lemma 7.1 below. If such λ exists, it is easy to show that λ > 0 if and only if E(Y ) < 0, since ϕ(λ) is convex and ϕ (0) = E(Y ). Moreover, such λ allows us to construct a new probability density function q(y) = e λ y s(y) where s(y) is the pdf of Y, and thus e λ y is just the likelihood ratio q(y)/s(y). Our second assumption essentially says that such λ > 0 exists for Y = ([f 1 (X)] [f 0 (X)] )/ under the pre-change hypothesis, and is rigorously stated as follows. Assumption 4.2. Given ɛ 0 and 0, assume there exists a number λ(ɛ, ) > 0 such that { } 1 = E h0 exp λ(ɛ, ) [f 1(X)] [f 0 (X)] (13) = (1 ɛ)e f0 exp {λ(ɛ, ) [f 1(X)] [f 0 (X)] } + ɛe g exp {λ(ɛ, ) [f 1(X)] [f 0 (X)] }. When = ɛ = 0, it is easy to see λ(ɛ = 0, = 0) = 1 in Assumption 4.2 since E f0 (e Y ) = 1 for Y = log f 1(X) f 0 (X). This suggests that Assumption 4.2 is reasonable at least when ɛ and are small. With Assumptions 4.1 and 4.2, we are able to present the properties of our proposed scheme N (b, d) in (8) in the following subsections. Subsection 4.1 discusses the false alarm properties, whereas Subsection 4.2 investigates the detection delay properties including the robustness regarding on the number of affected local data streams. The choice of tuning parameters of our proposed scheme N (b, d) in (8) is provided in Subsection 4.3. Since the false alarm robustness with respect to the outliers are very important in practice, we present it separately in Section 5.

13 SCALABLE ROBUST ONLINE MONITORING False alarm analysis. In this subsection, we analyze the global false alarm rate of our proposed scheme N (b, d) in (8) for online monitoring K local data streams under the gross error model in (2), no matter how large K is. The classical techniques in sequential change-point detection for one-dimensional data are based on the change of measure arguments and then use renewal theory to conduct overshoot analysis under the asymptotic setting as the global threshold b goes to. Unfortunately such renewal-theory-based analysis often yields poor approximations when the dimension K is moderately large, since the overshoot constant generally increases exponentially as a function of the dimension K. Moreover, they cannot be extended to the modern asymptotic regime when the number K of local data streams goes to. In other words, these classical techniques are unable to provide deep insight on the effects of the dimension K. Here we present an alternative approach that is based on Chebyshev s inequality and can provide useful information bounds on the global false alarm rate regardless of how large the number K of data streams is. Theorem 4.1. Given that Assumption 4.2 holds for ɛ 0 and 0, i.e., λ(ɛ, ) > 0. If λ(ɛ, )b > K exp{ λ(ɛ, )d}, then the average run length to false alarm of our proposed scheme N (b, d) in (8) satisfies (14) E ( ) ɛ [N (b, d)] 1 4 exp ( [ λ(ɛ, )b K exp{ λ(ɛ, )d} ] 2 ). The detailed proof of Theorem 4.1 will be postponed in subsection 7.1, and here let us add some comments to better understand the theorem. First, to the best of our knowledge, our rigorous, nonasymptotic result in (14) is the first of its kind in the sequential change-point detection literature, and it holds no matter how large the number K of data streams is. This allows us to investigate the modern asymptotic regime when the dimension K goes to. Second, the assumption of λ(ɛ, )b > K exp{ λ(ɛ, )d} essentially says that the global threshold b of our proposed scheme N (b, d) in (8) should be large enough if one want to control the global false alarm rate when online monitoring large-scale streams. In particular, in order to satisfy the false alarm constraint γ in (4), it is natural to set the right-hand side of (14) to γ. This yields a conservative choice of b that satisfies λ(ɛ, )b = K exp{ λ(ɛ, )d} + log(4γ). Such a choice of b will automatically satisfy the key assumption of λ(ɛ, )b > K exp{ λ(ɛ, )d} in the theorem. Third, when ɛ = = 0, we have λ(ɛ = 0, = 0) = 1, and our lower bound (14) is similar, though slightly looser, as compared to those results in equation (3.17) of Liu, Zhang and Mei (2017),

14 14 whose arguments are heuristic under a more refined assumption on some tail distribution (see G(x) defined in (39) below). Here we provide a rigorous mathematical statement in Theorem 4.1 with fewer assumptions, though the price we pay is that the corresponding lower bound is a little loose. Finally, it turns out that our lower bound (14) provides the correct first-order term of the classical CUSUM procedure when online monitoring K = 1 data stream under the idealized model. In that case, we have ɛ = = d = 0, and the classical CUSUM procedure is the special case of our procedure N =0 (b, d = 0). Since λ(ɛ = 0, = 0) = 1, our lower bound (14) shows that for any b > 1, (15) lim inf b log E ( ) ɛ=0 [N =0(b, d = 0)] 1. b Meanwhile, as the classical CUSUM procedure, it is well-known from the classical renewal-theorybased techniques that lim ( ) ɛ=0 [N =0(b,d=0)] log E b b = 1, see Lorden (1971). Hence, our lower bound (14) provides the correct first-order term for log E ( ) ɛ [N (b, d)] under the one-dimensional case as b. As a result, we feel our lower bound in (14) is not bad in the modern asymptotic regime when the dimension K goes to Detection delay analysis. In this subsection, we provide the detection delays of our proposed scheme N (b, d) in (8) under the gross error model in (2) when m out of K data streams are affected by the occurring event for some given 1 m K. The following theorem presents the detection delay properties, and the proof will be postponed in subsection 7.2. Theorem 4.2. Suppose Assumption 4.1 of I 1 (ɛ, ) > 0 in (11) holds, and assume m out of K local data streams are affected. If b/m + d goes to, then the detection delay of N (b, d) satisfies ( ) 1 b (16) D ɛ (N (b, d)) (1 + o(1)) I 1 (ɛ, ) m + d, where the o(1) term does not depend on the dimension K, and might depend on m and as well as the distributions h 0 and h 1. So far Theorems 4.1 and 4.2 investigate the performance properties of our proposed scheme N (b, d) in (8) without considering the false alarm constraint γ in (4). Let us now investigate the detection delay properties of our proposed scheme N (b, d) in (8) under the gross error model in (2), subject to the false alarm constraint γ in (4).

15 SCALABLE ROBUST ONLINE MONITORING 15 The following corollary characterizes such detection delay properties when the number m of affected data streams is known. It also includes the suitable choices of the soft-threshold parameter d and the global detection threshold b under the asymptotic regime when the false alarm constraint γ = γ(k) as the dimension K whereas the number m of affected data streams m = m(k) may or may not go to. Corollary 4.1. Under the assumptions of Theorems 4.1 and 4.2, for a given 0 and given d 0, a choice of global detection threshold (17) b γ = 1 ( ) 2 log(4γ) + K exp{ λ(ɛ, )d}, λ(ɛ, ) will guarantee that our proposed scheme N (b, d) satisfies the global false alarm constraint γ in (4). Moreover, in the asymptotic regime when the false alarm constraint γ = γ(k) and m = m(k) << min(log γ, K) as the dimension K, with b = b γ in (17), a first-order optimal choice of the soft-thresholding parameter d that minimize the upper bound of detection delay in (16) is (18) d opt = { 1 log K } log γ + log λ(ɛ, ) m m and the detection delay of the corresponding optimized scheme N (b γ, d opt ) in (8) satisfies { 1 + o(1) log γ log γ D ɛ (N (b γ, d opt )) + log λ(ɛ, )I 1 (ɛ, ) m m + log K } (19). m Proof: The choice of b = b γ in (17) follows directly from Theorem 4.1. To prove (18), we abuse the notation and use λ to denote λ(ɛ, ) for simplification. By Theorem 4.2, the optimal d is the non-negative value that minimize the function (20) l(d) := b γ m + d = 1 λm ( log(4γ) + Ke λd ) 2 + d. This is an elementary optimization problem, and the optimal d can be found by taking derivative of l(d) with respect to d, since l(d) is a convex function of d. To see this, l (d) = 1 log(4γ) m ( Ke λd + ) log(4γ) 2 4m l (d) = λ log(4γ) m ( Ke λd + ) Ke 2 λd > 0.

16 16 Thus l(d) is a convex function on [0, + ), and the optimal d opt value can be found by setting l (d) = 0 : Ke λd = This gives an unique optimal value m + log(4γ) log(4γ). (21) d opt = 1 λ log K ( m log(4γ) 2 1 log(4γ)) 2 [ = 1 m + 1 λ log 4 log(4γ) + ] log(4γ) + log K m m, which is equivalent to those in (18) under the assumption that m = m(k) << min(log γ, K). Plugging d = d opt in (21) back to (17) yields (19), and thus the corollary is proved. Note that on the right-hand side of (19), the dominant order is max( log γ m, log K m ), and the second term of log log γ m might be negligible. However, we decide to keep it in Corollary 4.1, since this term will reflect the effect on the assumed number of affected data streams. In practice, we often do not know the true number m of affected data streams. We may make an imperfect assumption that m 0 out of K data streams are affected, and then adopt the corresponding mis-optimized proposed scheme N (b γ, d opt ) in Corollary 4.1 under the imperfect assumption of m = m 0, e.g., d opt in (18) is defined when m = m 0. The following corollary establishes the robustness of our proposed scheme on the number of affected local data streams. Corollary 4.2. Assume the optimized scheme N (b γ, d opt ) in Corollary 1 is designed under the assumption that m 0 data streams are affected. When the true number of affected local data streams is m, if max(m, m 0 ) << min(log γ, K), its detection delay satisfies { 1 + o(1) log γ log γ D ɛ (N (b γ, d opt )) + log + log K } (22), λ(ɛ, )I 1 (ɛ, ) m m 0 m 0 which is asymptotically equivalent to the right-hand side of (19) whenever (23) log m log m 0 << max( log γ m, log K m ). Corollary 4.2 follows at once from Corollary 4.1, Theorem 4.2, and the facts that the difference between the right-hand sides of (22) and (19) is log m log m 0. Condition (23) essentially means

17 SCALABLE ROBUST ONLINE MONITORING 17 that we do not mis-specify the number of affected data streams very badly. In such a scenario, Corollary 4.2 suggests that the detection delay of the mis-optimized proposed scheme is similar to that of the correctly optimized proposed scheme, and thus our proposed scheme is robust with respect to the assumption of the number of affected data streams. It is useful to add some remarks to better understand Corollary 4.1, as research is rather limited in the sequential change-point detection literature in the modern asymptotic regime when the number K of data streams goes to. If we compare the optimal soft-thresholding parameter b opt in (18) with the minimum detection delay in (19), the effects of the dimension K are the same, but the effects of the false alarm constraint γ are different. Thus different asymptotic scenarios may arise depending on the asymptotic orders of log K log γ m, log m extreme cases. First, let us consider the extreme case when log K m log γ and m, and below we consider several log γ << log m, i.e., K << log γ. This is consistent with the classical asymptotic regime when K is fixed and the false alarm constraint γ goes to. In this case, for our proposed scheme, the minimum detection delay in (19) is of order log γ m. To be more concrete for the idealized model with ɛ = 0, the optimal choice of = 0, λ(ɛ = 0, = 0) = 1 and I 1 (ɛ = 0, = 0) = I(f 1, f 0 ), the Kullback-Leibler divergence. Hence the delay of N =0 (b γ, d opt ) would be bounded above by 1+o(1) I(f 1,f 0 ) log γ m. Meanwhile, under the idealized model, for any scheme T satisfying the false alarm constraint γ in (4), it is well-known that D ɛ=0 (T ) 1+o(1) I(f 1,f 0 ) log γ m as γ goes to, see Mei (2010). This suggests that our proposed scheme with = 0 attains the classical asymptotic lower bound under the idealized model with ɛ = 0 in the classic asymptotic regime of K << log γ. Second, let us consider another extreme case when log K m log γ >> m, or equivalently, when log γ << m log K m. This may occur when the number m of affected data streams is fixed and log γ = o(log K), i.e., the false alarm constraint γ is relatively small as compared to K. In this case, both the optimal soft-thresholding parameter d opt in (18) and the minimum detection delay in (19) are of order log K m, and the impact of the false alarm constraint γ is negligible. In other words, our proposed scheme need to take at most O(log K) observations to detect the sparse post-change scenario when only m out of K data streams are affected. This is consistent with the modern asymptotic regime results in the off-line statistics that O(log p) observations can fully recover the signal in p-dimensional observation under the sparsity assumption, see Candes and Tao (2007). Third, the other extreme case is when both log K m and log log γ m have the same order. This can

18 18 occur if m = K 1 β and log γ = K ζ for some 0 < β, ζ < 1, which was first investigated in Chan (2017) under the idealized model for Gaussian data. It is interesting to compare our results with those in Chan (2017). Under the idealized model with ɛ = 0, the optimal choice of = 0, and thus our results in Corollary 4.1 showed the the detection delay of our proposed scheme is of order K ζ+β 1 + (ξ + 2β 1) log K, which is actually of order log K if 1 ζ 2 < β < 1 ζ but of order K ζ+β 1 if ζ + β > 1. These two cases are exactly the assumptions in Theorems 1 and 4 of Chan (2017). While the assumption of m << min(log γ, K) in Corollary 4.1 corresponds to ζ + β > 1, in which our detection delay bound is identical to the optimal detection bound in Chan (2017), it is not difficult to see that the proof of Corollary 4.1 can be extended to the case of 1 ζ 2 < β < 1 ζ, in which our results are only slightly weaker than that of Chan (2017) in the sense that the order is the same but our constant coefficient is larger. The latter is understandable because Chan (2017) used the Guassian assumptions extensively to conduct a more careful detection delay analysis than our results in (16), and his results are refiner for Gaussian data under the idealized model. Meanwhile, our results are more general as they are applicable to any distributions and the gross error models. More importantly, our results give an simpler and more intuitive explanation on those assumptions in the theorems of Chan (2017), and provide a deeper insight of online monitoring large-scale data streams under general settings Optimal choices of tuning parameters. Note that there are three tuning parameters, the robustness parameter, the shrinkage parameter d and the control threshold b in our proposed scheme N (b, d) in (8). For any given and d, the control threshold b is chosen to satisfy the false alarm constraint γ in (4). Also the (asymptotic) optimal choice of the shrinkage parameter d is given in (18), which is consistent with our intuition that the shrinkage parameter depends on the number m of affected data streams. Below we will focus on the choice of the robustness parameter that balances the tradeoff between statistical efficiency and robustness under the gross error model. By (19) in Corollary 4.1, an optimal choice of is to maximize λ(ɛ, )I 1 (ɛ, ). For the purpose of better illustration, we treat = 0 as the baseline since it corresponds to the classical CUSUM scheme that is optimal under the idealized model. Then relation (19) inspires us to define the asymptotic efficiency improvement of the proposed scheme N (b, d) with 0 as compared to the baseline scheme N =0 (b, d) as (24) e(ɛ, ) = λ(ɛ, )I 1 (ɛ, ) λ(ɛ, = 0)I 1 (ɛ, = 0) 1

19 SCALABLE ROBUST ONLINE MONITORING 19 Hence, the optimal choice of can also be defined by maximizing the efficiency improvement e(ɛ, ). That is (25) opt (ɛ) = arg max [λ(ɛ, )I 1(ɛ, )] = arg max[e(ɛ, )] 0 0 It is non-trivial to derive the theoretical properties of opt as a function of ɛ, as it will depend on the relationships between f 0, f 1 and the contamination density g. One possible approach is to investigate the local structure of e(ɛ, ) in the neighborhood of (ɛ, ) = (0, 0) by considering the second-order Taylor expansions of λ(ɛ, ) in (4.2) and I 1 (ɛ, ) in (4.1) at (ɛ, ) = (0, 0). This allows us to approximate λ(ɛ, )I 1 (ɛ, ) as a quadratic polynomial function of for a given ɛ. Maximizing such quadratic function allows us to see that opt (ɛ) = C 1 ɛ+o(ɛ) for some constant C 1 that depends only on f 0, f 1 and g. In other words, the optimal choice opt (ɛ) seems to be linearly dependent on ɛ for very small ɛ. Unfortunately this works only for very small ɛ, and the expression of such constant C 1 is too complicated to be useful in practice. It remains an open problem to derive a meaningful, theoretical characterization of opt for general value of ɛ, but the good news is that the numerical values of opt can be found fairly easy. Below let us provide an efficient algorithm to numerically compute e(ɛ, ) in (24) and the optimal opt in (25) for given ɛ and g. The main tool is the Monte Carlo integration and grid search, and our key idea to simplify computational complexity is to run Monte Carlo simulation once to compute λ(ɛ, ) in (11) and I 1 (ɛ, ) in (13) simultaneously for many possible combinations of (ɛ, ). When computing λ(ɛ, ) in (11), we first generate one set of m (e.g., = 10 4 ) i.i.d. random variables X (1) 1,..., X(1) m from density f 0, and another set of m i.i.d. random variables X (2) 1,..., X(2) m from density g. Next, we conduct grid search by specifying a list of values for ɛ > 0, > 0 and λ > 0. For each combination of those (ɛ,, λ), we compute the objective function { } H(ɛ,, λ) = 1 ɛ m exp λ [f 1(X (1) i )] [f 0 (X (1) i )] m i=1 { } + ɛ m exp λ [f 1(X (2) i )] [f 0 (X (2) i )] m i=1 Then for each fixed pair (ɛ, ), we estimate λ(ɛ, ) by numerically searching for λ such that H(ɛ,, λ) 1. This algorithm is computationally efficient, as it only needs to generate the random variables, X (1) i s and X (2) i s once, which can then be used repeatedly and simultaneously for all pairs (, ɛ) s via matrix operations. Similar ideas can be applied to efficiently estimate I 1 (ɛ, ).

20 20 This allows us to efficiently compute e(ɛ, ) for many combinations of (ɛ, ). Finally, for a given ɛ 0, a brute-force exhaustive search over to obtain the optimal value that maximizes e(ɛ, ). As an illustration, we consider a concrete example when f 0 is the pdf of N(0, 1), f 1 is the pdf of N(1, 1), g is the pdf of N(0, 3 2 ). We run the above-mentioned numerical algorithm with m = 10 4 random samples when ɛ varies from 0 to 0.15 with step size 0.01, varies from 0 to 0.9 with step size 0.01, and λ varies from 0.1 to 5 with step size The total number of different combinations of (ɛ,, λ) is , and the computation time is around 8 minutes on a Windows 10 Laptop with Intel i5-6200u CPU 2.30 GHz. Figure 2 plots e(ɛ, ) as a function of the tuning parameter for several fixed ɛ. From Figure 2, it is clear that when ɛ = 0, the e(ɛ = 0, ) curve (red curve) is linearly decreasing as a function of 0, and thus the optimal choice of is 0 for ɛ = 0. This is consistent with the optimality properties of the CUSUM statistic under the idealized model without outliers. Meanwhile, for any other contamination rate ɛ > 0, the e(ɛ, ) curve is first increasing and then decreasing as increases. Thus the optimal choice of opt is often positive when ɛ > 0. For instance, when ɛ = 0.1, Figure 2 (blue curve) shows that opt (ɛ = 0.1) 0.21, and e(ɛ = 0.1, = 0.21) This suggests that our proposed L -CUSUM based scheme with = 0.21 will be 63% more efficient than the baseline CUSUM based scheme under the gross error model when there are 10% outliers. Figure 3 shows the efficiency improvement of our proposed L -CUSUM based scheme with = 0.21 under different contamination ratio ɛ from 0 to From the plot, we can see that as compared to the classical CUSUM based method, our proposed L -CUSUM based scheme with = 0.21 will gain 40% 70% more efficiency when the contamination ratio ɛ [2%, 15%], and the price we pay is to lose 5% efficiency under the idealized model with ɛ = The breakdown point analysis. In the classical offline robust statistics, the breakdown point is one of the most popular measures of robustness of statistical procedures. At the high-level, in the context of finite samples, the breakdown point is the smallest percentage of contaminations that may cause an estimator or statistical test to be really poor. For instance, when estimating the parameter of a distribution, the breakdown point of the sample mean is 0 since a single outlier can completely change the value of the sample mean, whereas the breakdown point of the sample median is 1/2. This suggests that the sample median is more robust than the sample mean. Since the pioneering work of Hampel (1968) for the asymptotic definition of breakdown point, much research has been done to investigate the breakdown point for different robust estimators or

Large-Scale Multi-Stream Quickest Change Detection via Shrinkage Post-Change Estimation

Large-Scale Multi-Stream Quickest Change Detection via Shrinkage Post-Change Estimation Large-Scale Multi-Stream Quickest Change Detection via Shrinkage Post-Change Estimation Yuan Wang and Yajun Mei arxiv:1308.5738v3 [math.st] 16 Mar 2016 July 10, 2015 Abstract The quickest change detection

More information

EARLY DETECTION OF A CHANGE IN POISSON RATE AFTER ACCOUNTING FOR POPULATION SIZE EFFECTS

EARLY DETECTION OF A CHANGE IN POISSON RATE AFTER ACCOUNTING FOR POPULATION SIZE EFFECTS Statistica Sinica 21 (2011), 597-624 EARLY DETECTION OF A CHANGE IN POISSON RATE AFTER ACCOUNTING FOR POPULATION SIZE EFFECTS Yajun Mei, Sung Won Han and Kwok-Leung Tsui Georgia Institute of Technology

More information

Uncertainty. Jayakrishnan Unnikrishnan. CSL June PhD Defense ECE Department

Uncertainty. Jayakrishnan Unnikrishnan. CSL June PhD Defense ECE Department Decision-Making under Statistical Uncertainty Jayakrishnan Unnikrishnan PhD Defense ECE Department University of Illinois at Urbana-Champaign CSL 141 12 June 2010 Statistical Decision-Making Relevant in

More information

Early Detection of a Change in Poisson Rate After Accounting For Population Size Effects

Early Detection of a Change in Poisson Rate After Accounting For Population Size Effects Early Detection of a Change in Poisson Rate After Accounting For Population Size Effects School of Industrial and Systems Engineering, Georgia Institute of Technology, 765 Ferst Drive NW, Atlanta, GA 30332-0205,

More information

Efficient scalable schemes for monitoring a large number of data streams

Efficient scalable schemes for monitoring a large number of data streams Biometrika (2010, 97, 2,pp. 419 433 C 2010 Biometrika Trust Printed in Great Britain doi: 10.1093/biomet/asq010 Advance Access publication 16 April 2010 Efficient scalable schemes for monitoring a large

More information

Statistical Models and Algorithms for Real-Time Anomaly Detection Using Multi-Modal Data

Statistical Models and Algorithms for Real-Time Anomaly Detection Using Multi-Modal Data Statistical Models and Algorithms for Real-Time Anomaly Detection Using Multi-Modal Data Taposh Banerjee University of Texas at San Antonio Joint work with Gene Whipps (US Army Research Laboratory) Prudhvi

More information

COMPARISON OF STATISTICAL ALGORITHMS FOR POWER SYSTEM LINE OUTAGE DETECTION

COMPARISON OF STATISTICAL ALGORITHMS FOR POWER SYSTEM LINE OUTAGE DETECTION COMPARISON OF STATISTICAL ALGORITHMS FOR POWER SYSTEM LINE OUTAGE DETECTION Georgios Rovatsos*, Xichen Jiang*, Alejandro D. Domínguez-García, and Venugopal V. Veeravalli Department of Electrical and Computer

More information

Asymptotically Optimal Quickest Change Detection in Distributed Sensor Systems

Asymptotically Optimal Quickest Change Detection in Distributed Sensor Systems This article was downloaded by: [University of Illinois at Urbana-Champaign] On: 23 May 2012, At: 16:03 Publisher: Taylor & Francis Informa Ltd Registered in England and Wales Registered Number: 1072954

More information

SEQUENTIAL CHANGE-POINT DETECTION WHEN THE PRE- AND POST-CHANGE PARAMETERS ARE UNKNOWN. Tze Leung Lai Haipeng Xing

SEQUENTIAL CHANGE-POINT DETECTION WHEN THE PRE- AND POST-CHANGE PARAMETERS ARE UNKNOWN. Tze Leung Lai Haipeng Xing SEQUENTIAL CHANGE-POINT DETECTION WHEN THE PRE- AND POST-CHANGE PARAMETERS ARE UNKNOWN By Tze Leung Lai Haipeng Xing Technical Report No. 2009-5 April 2009 Department of Statistics STANFORD UNIVERSITY

More information

CHANGE DETECTION WITH UNKNOWN POST-CHANGE PARAMETER USING KIEFER-WOLFOWITZ METHOD

CHANGE DETECTION WITH UNKNOWN POST-CHANGE PARAMETER USING KIEFER-WOLFOWITZ METHOD CHANGE DETECTION WITH UNKNOWN POST-CHANGE PARAMETER USING KIEFER-WOLFOWITZ METHOD Vijay Singamasetty, Navneeth Nair, Srikrishna Bhashyam and Arun Pachai Kannu Department of Electrical Engineering Indian

More information

Quickest Detection With Post-Change Distribution Uncertainty

Quickest Detection With Post-Change Distribution Uncertainty Quickest Detection With Post-Change Distribution Uncertainty Heng Yang City University of New York, Graduate Center Olympia Hadjiliadis City University of New York, Brooklyn College and Graduate Center

More information

Sequential Detection. Changes: an overview. George V. Moustakides

Sequential Detection. Changes: an overview. George V. Moustakides Sequential Detection of Changes: an overview George V. Moustakides Outline Sequential hypothesis testing and Sequential detection of changes The Sequential Probability Ratio Test (SPRT) for optimum hypothesis

More information

Least Favorable Distributions for Robust Quickest Change Detection

Least Favorable Distributions for Robust Quickest Change Detection Least Favorable Distributions for Robust Quickest hange Detection Jayakrishnan Unnikrishnan, Venugopal V. Veeravalli, Sean Meyn Department of Electrical and omputer Engineering, and oordinated Science

More information

Quantization Effect on the Log-Likelihood Ratio and Its Application to Decentralized Sequential Detection

Quantization Effect on the Log-Likelihood Ratio and Its Application to Decentralized Sequential Detection 1536 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 61, NO. 6, MARCH 15, 2013 Quantization Effect on the Log-Likelihood Ratio Its Application to Decentralized Sequential Detection Yan Wang Yajun Mei Abstract

More information

Research Article Degenerate-Generalized Likelihood Ratio Test for One-Sided Composite Hypotheses

Research Article Degenerate-Generalized Likelihood Ratio Test for One-Sided Composite Hypotheses Mathematical Problems in Engineering Volume 2012, Article ID 538342, 11 pages doi:10.1155/2012/538342 Research Article Degenerate-Generalized Likelihood Ratio Test for One-Sided Composite Hypotheses Dongdong

More information

arxiv: v2 [eess.sp] 20 Nov 2017

arxiv: v2 [eess.sp] 20 Nov 2017 Distributed Change Detection Based on Average Consensus Qinghua Liu and Yao Xie November, 2017 arxiv:1710.10378v2 [eess.sp] 20 Nov 2017 Abstract Distributed change-point detection has been a fundamental

More information

arxiv:math/ v2 [math.st] 15 May 2006

arxiv:math/ v2 [math.st] 15 May 2006 The Annals of Statistics 2006, Vol. 34, No. 1, 92 122 DOI: 10.1214/009053605000000859 c Institute of Mathematical Statistics, 2006 arxiv:math/0605322v2 [math.st] 15 May 2006 SEQUENTIAL CHANGE-POINT DETECTION

More information

Data-Efficient Quickest Change Detection

Data-Efficient Quickest Change Detection Data-Efficient Quickest Change Detection Venu Veeravalli ECE Department & Coordinated Science Lab University of Illinois at Urbana-Champaign http://www.ifp.illinois.edu/~vvv (joint work with Taposh Banerjee)

More information

A CUSUM approach for online change-point detection on curve sequences

A CUSUM approach for online change-point detection on curve sequences ESANN 22 proceedings, European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning. Bruges Belgium, 25-27 April 22, i6doc.com publ., ISBN 978-2-8749-49-. Available

More information

The Shiryaev-Roberts Changepoint Detection Procedure in Retrospect - Theory and Practice

The Shiryaev-Roberts Changepoint Detection Procedure in Retrospect - Theory and Practice The Shiryaev-Roberts Changepoint Detection Procedure in Retrospect - Theory and Practice Department of Statistics The Hebrew University of Jerusalem Mount Scopus 91905 Jerusalem, Israel msmp@mscc.huji.ac.il

More information

Sequential Change-Point Approach for Online Community Detection

Sequential Change-Point Approach for Online Community Detection Sequential Change-Point Approach for Online Community Detection Yao Xie Joint work with David Marangoni-Simonsen H. Milton Stewart School of Industrial and Systems Engineering Georgia Institute of Technology

More information

Change-point models and performance measures for sequential change detection

Change-point models and performance measures for sequential change detection Change-point models and performance measures for sequential change detection Department of Electrical and Computer Engineering, University of Patras, 26500 Rion, Greece moustaki@upatras.gr George V. Moustakides

More information

Can we do statistical inference in a non-asymptotic way? 1

Can we do statistical inference in a non-asymptotic way? 1 Can we do statistical inference in a non-asymptotic way? 1 Guang Cheng 2 Statistics@Purdue www.science.purdue.edu/bigdata/ ONR Review Meeting@Duke Oct 11, 2017 1 Acknowledge NSF, ONR and Simons Foundation.

More information

Thresholded Multivariate Principal Component Analysis for Multi-channel Profile Monitoring

Thresholded Multivariate Principal Component Analysis for Multi-channel Profile Monitoring Thresholded Multivariate Principal Component Analysis for Multi-channel Profile Monitoring arxiv:163.5265v1 [stat.ap] 16 Mar 216 Yuan Wang, Kamran Paynabar, Yajun Mei H. Milton Stewart School of Industrial

More information

Optimum CUSUM Tests for Detecting Changes in Continuous Time Processes

Optimum CUSUM Tests for Detecting Changes in Continuous Time Processes Optimum CUSUM Tests for Detecting Changes in Continuous Time Processes George V. Moustakides INRIA, Rennes, France Outline The change detection problem Overview of existing results Lorden s criterion and

More information

WEIGHTED QUANTILE REGRESSION THEORY AND ITS APPLICATION. Abstract

WEIGHTED QUANTILE REGRESSION THEORY AND ITS APPLICATION. Abstract Journal of Data Science,17(1). P. 145-160,2019 DOI:10.6339/JDS.201901_17(1).0007 WEIGHTED QUANTILE REGRESSION THEORY AND ITS APPLICATION Wei Xiong *, Maozai Tian 2 1 School of Statistics, University of

More information

Quickest Changepoint Detection: Optimality Properties of the Shiryaev Roberts-Type Procedures

Quickest Changepoint Detection: Optimality Properties of the Shiryaev Roberts-Type Procedures Quickest Changepoint Detection: Optimality Properties of the Shiryaev Roberts-Type Procedures Alexander Tartakovsky Department of Statistics a.tartakov@uconn.edu Inference for Change-Point and Related

More information

Change Detection Algorithms

Change Detection Algorithms 5 Change Detection Algorithms In this chapter, we describe the simplest change detection algorithms. We consider a sequence of independent random variables (y k ) k with a probability density p (y) depending

More information

EFFICIENT CHANGE DETECTION METHODS FOR BIO AND HEALTHCARE SURVEILLANCE

EFFICIENT CHANGE DETECTION METHODS FOR BIO AND HEALTHCARE SURVEILLANCE EFFICIENT CHANGE DETECTION METHODS FOR BIO AND HEALTHCARE SURVEILLANCE A Thesis Presented to The Academic Faculty by Sung Won Han In Partial Fulfillment of the Requirements for the Degree Doctor of Philosophy

More information

Space-Time CUSUM for Distributed Quickest Detection Using Randomly Spaced Sensors Along a Path

Space-Time CUSUM for Distributed Quickest Detection Using Randomly Spaced Sensors Along a Path Space-Time CUSUM for Distributed Quickest Detection Using Randomly Spaced Sensors Along a Path Daniel Egea-Roca, Gonzalo Seco-Granados, José A López-Salcedo, Sunwoo Kim Dpt of Telecommunications and Systems

More information

Sequential Procedure for Testing Hypothesis about Mean of Latent Gaussian Process

Sequential Procedure for Testing Hypothesis about Mean of Latent Gaussian Process Applied Mathematical Sciences, Vol. 4, 2010, no. 62, 3083-3093 Sequential Procedure for Testing Hypothesis about Mean of Latent Gaussian Process Julia Bondarenko Helmut-Schmidt University Hamburg University

More information

arxiv: v2 [math.st] 20 Jul 2016

arxiv: v2 [math.st] 20 Jul 2016 arxiv:1603.02791v2 [math.st] 20 Jul 2016 Asymptotically optimal, sequential, multiple testing procedures with prior information on the number of signals Y. Song and G. Fellouris Department of Statistics,

More information

Real-Time Detection of Hybrid and Stealthy Cyber-Attacks in Smart Grid

Real-Time Detection of Hybrid and Stealthy Cyber-Attacks in Smart Grid 1 Real-Time Detection of Hybrid and Stealthy Cyber-Attacks in Smart Grid Mehmet Necip Kurt, Yasin Yılmaz, Member, IEEE, and Xiaodong Wang, Fellow, IEEE Abstract For a safe and reliable operation of the

More information

Lecture 5: Likelihood ratio tests, Neyman-Pearson detectors, ROC curves, and sufficient statistics. 1 Executive summary

Lecture 5: Likelihood ratio tests, Neyman-Pearson detectors, ROC curves, and sufficient statistics. 1 Executive summary ECE 830 Spring 207 Instructor: R. Willett Lecture 5: Likelihood ratio tests, Neyman-Pearson detectors, ROC curves, and sufficient statistics Executive summary In the last lecture we saw that the likelihood

More information

Lecture 7 Introduction to Statistical Decision Theory

Lecture 7 Introduction to Statistical Decision Theory Lecture 7 Introduction to Statistical Decision Theory I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 20, 2016 1 / 55 I-Hsiang Wang IT Lecture 7

More information

min f(x). (2.1) Objectives consisting of a smooth convex term plus a nonconvex regularization term;

min f(x). (2.1) Objectives consisting of a smooth convex term plus a nonconvex regularization term; Chapter 2 Gradient Methods The gradient method forms the foundation of all of the schemes studied in this book. We will provide several complementary perspectives on this algorithm that highlight the many

More information

7068 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 57, NO. 10, OCTOBER 2011

7068 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 57, NO. 10, OCTOBER 2011 7068 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 57, NO. 10, OCTOBER 2011 Asymptotic Optimality Theory for Decentralized Sequential Multihypothesis Testing Problems Yan Wang Yajun Mei Abstract The Bayesian

More information

Monitoring Multivariate Data via KNN Learning

Monitoring Multivariate Data via KNN Learning Monitoring Multivariate Data via KNN Learning Chi Zhang 1, Yajun Mei 2, Fugee Tsung 1 1 Department of Industrial Engineering and Logistics Management, Hong Kong University of Science and Technology, Clear

More information

Chap 1. Overview of Statistical Learning (HTF, , 2.9) Yongdai Kim Seoul National University

Chap 1. Overview of Statistical Learning (HTF, , 2.9) Yongdai Kim Seoul National University Chap 1. Overview of Statistical Learning (HTF, 2.1-2.6, 2.9) Yongdai Kim Seoul National University 0. Learning vs Statistical learning Learning procedure Construct a claim by observing data or using logics

More information

Asynchronous Multi-Sensor Change-Point Detection for Seismic Tremors

Asynchronous Multi-Sensor Change-Point Detection for Seismic Tremors Asynchronous Multi-Sensor Change-Point Detection for Seismic Tremors Liyan Xie, Yao Xie, George V. Moustakides School of Industrial & Systems Engineering, Georgia Institute of Technology, {lxie49, yao.xie}@isye.gatech.edu

More information

X 1,n. X L, n S L S 1. Fusion Center. Final Decision. Information Bounds and Quickest Change Detection in Decentralized Decision Systems

X 1,n. X L, n S L S 1. Fusion Center. Final Decision. Information Bounds and Quickest Change Detection in Decentralized Decision Systems 1 Information Bounds and Quickest Change Detection in Decentralized Decision Systems X 1,n X L, n Yajun Mei Abstract The quickest change detection problem is studied in decentralized decision systems,

More information

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 57, NO. 3, MARCH

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 57, NO. 3, MARCH IEEE TRANSACTIONS ON INFORMATION THEORY, VOL 57, NO 3, MARCH 2011 1587 Universal and Composite Hypothesis Testing via Mismatched Divergence Jayakrishnan Unnikrishnan, Member, IEEE, Dayu Huang, Student

More information

ROBUST TESTS BASED ON MINIMUM DENSITY POWER DIVERGENCE ESTIMATORS AND SADDLEPOINT APPROXIMATIONS

ROBUST TESTS BASED ON MINIMUM DENSITY POWER DIVERGENCE ESTIMATORS AND SADDLEPOINT APPROXIMATIONS ROBUST TESTS BASED ON MINIMUM DENSITY POWER DIVERGENCE ESTIMATORS AND SADDLEPOINT APPROXIMATIONS AIDA TOMA The nonrobustness of classical tests for parametric models is a well known problem and various

More information

Optimal Estimation of a Nonsmooth Functional

Optimal Estimation of a Nonsmooth Functional Optimal Estimation of a Nonsmooth Functional T. Tony Cai Department of Statistics The Wharton School University of Pennsylvania http://stat.wharton.upenn.edu/ tcai Joint work with Mark Low 1 Question Suppose

More information

Introduction to Bayesian Statistics

Introduction to Bayesian Statistics Bayesian Parameter Estimation Introduction to Bayesian Statistics Harvey Thornburg Center for Computer Research in Music and Acoustics (CCRMA) Department of Music, Stanford University Stanford, California

More information

Composite Hypotheses and Generalized Likelihood Ratio Tests

Composite Hypotheses and Generalized Likelihood Ratio Tests Composite Hypotheses and Generalized Likelihood Ratio Tests Rebecca Willett, 06 In many real world problems, it is difficult to precisely specify probability distributions. Our models for data may involve

More information

Dynamic Data-Driven Adaptive Sampling and Monitoring of Big Spatial-Temporal Data Streams for Real-Time Solar Flare Detection

Dynamic Data-Driven Adaptive Sampling and Monitoring of Big Spatial-Temporal Data Streams for Real-Time Solar Flare Detection Dynamic Data-Driven Adaptive Sampling and Monitoring of Big Spatial-Temporal Data Streams for Real-Time Solar Flare Detection Dr. Kaibo Liu Department of Industrial and Systems Engineering University of

More information

Online Seismic Event Picking via Sequential Change-Point Detection

Online Seismic Event Picking via Sequential Change-Point Detection Online Seismic Event Picking via Sequential Change-Point Detection Shuang Li, Yang Cao, Christina Leamon, Yao Xie H. Milton Stewart School of Industrial and Systems Engineering Georgia Institute of Technology

More information

Quickest Anomaly Detection: A Case of Active Hypothesis Testing

Quickest Anomaly Detection: A Case of Active Hypothesis Testing Quickest Anomaly Detection: A Case of Active Hypothesis Testing Kobi Cohen, Qing Zhao Department of Electrical Computer Engineering, University of California, Davis, CA 95616 {yscohen, qzhao}@ucdavis.edu

More information

Performance of Certain Decentralized Distributed Change Detection Procedures

Performance of Certain Decentralized Distributed Change Detection Procedures Performance of Certain Decentralized Distributed Change Detection Procedures Alexander G. Tartakovsky Center for Applied Mathematical Sciences and Department of Mathematics University of Southern California

More information

2. What are the tradeoffs among different measures of error (e.g. probability of false alarm, probability of miss, etc.)?

2. What are the tradeoffs among different measures of error (e.g. probability of false alarm, probability of miss, etc.)? ECE 830 / CS 76 Spring 06 Instructors: R. Willett & R. Nowak Lecture 3: Likelihood ratio tests, Neyman-Pearson detectors, ROC curves, and sufficient statistics Executive summary In the last lecture we

More information

Bayesian estimation of the discrepancy with misspecified parametric models

Bayesian estimation of the discrepancy with misspecified parametric models Bayesian estimation of the discrepancy with misspecified parametric models Pierpaolo De Blasi University of Torino & Collegio Carlo Alberto Bayesian Nonparametrics workshop ICERM, 17-21 September 2012

More information

CS168: The Modern Algorithmic Toolbox Lecture #6: Regularization

CS168: The Modern Algorithmic Toolbox Lecture #6: Regularization CS168: The Modern Algorithmic Toolbox Lecture #6: Regularization Tim Roughgarden & Gregory Valiant April 18, 2018 1 The Context and Intuition behind Regularization Given a dataset, and some class of models

More information

A Novel Asynchronous Communication Paradigm: Detection, Isolation, and Coding

A Novel Asynchronous Communication Paradigm: Detection, Isolation, and Coding A Novel Asynchronous Communication Paradigm: Detection, Isolation, and Coding The MIT Faculty has made this article openly available. Please share how this access benefits you. Your story matters. Citation

More information

Robustness and Distribution Assumptions

Robustness and Distribution Assumptions Chapter 1 Robustness and Distribution Assumptions 1.1 Introduction In statistics, one often works with model assumptions, i.e., one assumes that data follow a certain model. Then one makes use of methodology

More information

Approximation of Average Run Length of Moving Sum Algorithms Using Multivariate Probabilities

Approximation of Average Run Length of Moving Sum Algorithms Using Multivariate Probabilities Syracuse University SURFACE Electrical Engineering and Computer Science College of Engineering and Computer Science 3-1-2010 Approximation of Average Run Length of Moving Sum Algorithms Using Multivariate

More information

SPARSE signal representations have gained popularity in recent

SPARSE signal representations have gained popularity in recent 6958 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 57, NO. 10, OCTOBER 2011 Blind Compressed Sensing Sivan Gleichman and Yonina C. Eldar, Senior Member, IEEE Abstract The fundamental principle underlying

More information

Lecture 1: Entropy, convexity, and matrix scaling CSE 599S: Entropy optimality, Winter 2016 Instructor: James R. Lee Last updated: January 24, 2016

Lecture 1: Entropy, convexity, and matrix scaling CSE 599S: Entropy optimality, Winter 2016 Instructor: James R. Lee Last updated: January 24, 2016 Lecture 1: Entropy, convexity, and matrix scaling CSE 599S: Entropy optimality, Winter 2016 Instructor: James R. Lee Last updated: January 24, 2016 1 Entropy Since this course is about entropy maximization,

More information

An Effective Approach to Nonparametric Quickest Detection and Its Decentralized Realization

An Effective Approach to Nonparametric Quickest Detection and Its Decentralized Realization University of Tennessee, Knoxville Trace: Tennessee Research and Creative Exchange Doctoral Dissertations Graduate School 5-2010 An Effective Approach to Nonparametric Quickest Detection and Its Decentralized

More information

Scalable robust hypothesis tests using graphical models

Scalable robust hypothesis tests using graphical models Scalable robust hypothesis tests using graphical models Umamahesh Srinivas ipal Group Meeting October 22, 2010 Binary hypothesis testing problem Random vector x = (x 1,...,x n ) R n generated from either

More information

Optimization for Compressed Sensing

Optimization for Compressed Sensing Optimization for Compressed Sensing Robert J. Vanderbei 2014 March 21 Dept. of Industrial & Systems Engineering University of Florida http://www.princeton.edu/ rvdb Lasso Regression The problem is to solve

More information

1 Motivation for Instrumental Variable (IV) Regression

1 Motivation for Instrumental Variable (IV) Regression ECON 370: IV & 2SLS 1 Instrumental Variables Estimation and Two Stage Least Squares Econometric Methods, ECON 370 Let s get back to the thiking in terms of cross sectional (or pooled cross sectional) data

More information

The Moment Method; Convex Duality; and Large/Medium/Small Deviations

The Moment Method; Convex Duality; and Large/Medium/Small Deviations Stat 928: Statistical Learning Theory Lecture: 5 The Moment Method; Convex Duality; and Large/Medium/Small Deviations Instructor: Sham Kakade The Exponential Inequality and Convex Duality The exponential

More information

Machine Learning Linear Regression. Prof. Matteo Matteucci

Machine Learning Linear Regression. Prof. Matteo Matteucci Machine Learning Linear Regression Prof. Matteo Matteucci Outline 2 o Simple Linear Regression Model Least Squares Fit Measures of Fit Inference in Regression o Multi Variate Regession Model Least Squares

More information

is a Borel subset of S Θ for each c R (Bertsekas and Shreve, 1978, Proposition 7.36) This always holds in practical applications.

is a Borel subset of S Θ for each c R (Bertsekas and Shreve, 1978, Proposition 7.36) This always holds in practical applications. Stat 811 Lecture Notes The Wald Consistency Theorem Charles J. Geyer April 9, 01 1 Analyticity Assumptions Let { f θ : θ Θ } be a family of subprobability densities 1 with respect to a measure µ on a measurable

More information

Review. DS GA 1002 Statistical and Mathematical Models. Carlos Fernandez-Granda

Review. DS GA 1002 Statistical and Mathematical Models.   Carlos Fernandez-Granda Review DS GA 1002 Statistical and Mathematical Models http://www.cims.nyu.edu/~cfgranda/pages/dsga1002_fall16 Carlos Fernandez-Granda Probability and statistics Probability: Framework for dealing with

More information

Ordinal Optimization and Multi Armed Bandit Techniques

Ordinal Optimization and Multi Armed Bandit Techniques Ordinal Optimization and Multi Armed Bandit Techniques Sandeep Juneja. with Peter Glynn September 10, 2014 The ordinal optimization problem Determining the best of d alternative designs for a system, on

More information

Robustness Meets Algorithms

Robustness Meets Algorithms Robustness Meets Algorithms Ankur Moitra (MIT) ICML 2017 Tutorial, August 6 th CLASSIC PARAMETER ESTIMATION Given samples from an unknown distribution in some class e.g. a 1-D Gaussian can we accurately

More information

Lecture 22: Error exponents in hypothesis testing, GLRT

Lecture 22: Error exponents in hypothesis testing, GLRT 10-704: Information Processing and Learning Spring 2012 Lecture 22: Error exponents in hypothesis testing, GLRT Lecturer: Aarti Singh Scribe: Aarti Singh Disclaimer: These notes have not been subjected

More information

Inference For High Dimensional M-estimates. Fixed Design Results

Inference For High Dimensional M-estimates. Fixed Design Results : Fixed Design Results Lihua Lei Advisors: Peter J. Bickel, Michael I. Jordan joint work with Peter J. Bickel and Noureddine El Karoui Dec. 8, 2016 1/57 Table of Contents 1 Background 2 Main Results and

More information

REPORT DOCUMENTATION PAGE

REPORT DOCUMENTATION PAGE REPORT DOCUMENTATION PAGE Form Approved OMB NO. 0704-088 The public reporting burden for this collection of information is estimated to average hour per response, including the time for reviewing instructions,

More information

Quantifying Stochastic Model Errors via Robust Optimization

Quantifying Stochastic Model Errors via Robust Optimization Quantifying Stochastic Model Errors via Robust Optimization IPAM Workshop on Uncertainty Quantification for Multiscale Stochastic Systems and Applications Jan 19, 2016 Henry Lam Industrial & Operations

More information

Bayesian Quickest Change Detection Under Energy Constraints

Bayesian Quickest Change Detection Under Energy Constraints Bayesian Quickest Change Detection Under Energy Constraints Taposh Banerjee and Venugopal V. Veeravalli ECE Department and Coordinated Science Laboratory University of Illinois at Urbana-Champaign, Urbana,

More information

Statistical Data Analysis

Statistical Data Analysis DS-GA 0 Lecture notes 8 Fall 016 1 Descriptive statistics Statistical Data Analysis In this section we consider the problem of analyzing a set of data. We describe several techniques for visualizing the

More information

Peter Hoff Minimax estimation October 31, Motivation and definition. 2 Least favorable prior 3. 3 Least favorable prior sequence 11

Peter Hoff Minimax estimation October 31, Motivation and definition. 2 Least favorable prior 3. 3 Least favorable prior sequence 11 Contents 1 Motivation and definition 1 2 Least favorable prior 3 3 Least favorable prior sequence 11 4 Nonparametric problems 15 5 Minimax and admissibility 18 6 Superefficiency and sparsity 19 Most of

More information

27 : Distributed Monte Carlo Markov Chain. 1 Recap of MCMC and Naive Parallel Gibbs Sampling

27 : Distributed Monte Carlo Markov Chain. 1 Recap of MCMC and Naive Parallel Gibbs Sampling 10-708: Probabilistic Graphical Models 10-708, Spring 2014 27 : Distributed Monte Carlo Markov Chain Lecturer: Eric P. Xing Scribes: Pengtao Xie, Khoa Luu In this scribe, we are going to review the Parallel

More information

Lessons learned from the theory and practice of. change detection. Introduction. Content. Simulated data - One change (Signal and spectral densities)

Lessons learned from the theory and practice of. change detection. Introduction. Content. Simulated data - One change (Signal and spectral densities) Lessons learned from the theory and practice of change detection Simulated data - One change (Signal and spectral densities) - Michèle Basseville - 4 6 8 4 6 8 IRISA / CNRS, Rennes, France michele.basseville@irisa.fr

More information

Sequential Decision Problems

Sequential Decision Problems Sequential Decision Problems Michael A. Goodrich November 10, 2006 If I make changes to these notes after they are posted and if these changes are important (beyond cosmetic), the changes will highlighted

More information

Machine Learning. Lecture 9: Learning Theory. Feng Li.

Machine Learning. Lecture 9: Learning Theory. Feng Li. Machine Learning Lecture 9: Learning Theory Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 2018 Why Learning Theory How can we tell

More information

Covariance function estimation in Gaussian process regression

Covariance function estimation in Gaussian process regression Covariance function estimation in Gaussian process regression François Bachoc Department of Statistics and Operations Research, University of Vienna WU Research Seminar - May 2015 François Bachoc Gaussian

More information

Optimum Joint Detection and Estimation

Optimum Joint Detection and Estimation Report SSP-2010-1: Optimum Joint Detection and Estimation George V. Moustakides Statistical Signal Processing Group Department of Electrical & Computer Engineering niversity of Patras, GREECE Contents

More information

Introduction to Empirical Processes and Semiparametric Inference Lecture 01: Introduction and Overview

Introduction to Empirical Processes and Semiparametric Inference Lecture 01: Introduction and Overview Introduction to Empirical Processes and Semiparametric Inference Lecture 01: Introduction and Overview Michael R. Kosorok, Ph.D. Professor and Chair of Biostatistics Professor of Statistics and Operations

More information

The Growth of Functions. A Practical Introduction with as Little Theory as possible

The Growth of Functions. A Practical Introduction with as Little Theory as possible The Growth of Functions A Practical Introduction with as Little Theory as possible Complexity of Algorithms (1) Before we talk about the growth of functions and the concept of order, let s discuss why

More information

Accuracy and Decision Time for Decentralized Implementations of the Sequential Probability Ratio Test

Accuracy and Decision Time for Decentralized Implementations of the Sequential Probability Ratio Test 21 American Control Conference Marriott Waterfront, Baltimore, MD, USA June 3-July 2, 21 ThA1.3 Accuracy Decision Time for Decentralized Implementations of the Sequential Probability Ratio Test Sra Hala

More information

Ignoring Is a Bliss: Learning with Large Noise Through Reweighting-Minimization

Ignoring Is a Bliss: Learning with Large Noise Through Reweighting-Minimization Proceedings of Machine Learning Research vol 65:1 33, 017 Ignoring Is a Bliss: Learning with Large Noise Through Reweighting-Minimization Daniel Vainsencher Voleon Shie Mannor Faculty of Electrical Engineering,

More information

A General Overview of Parametric Estimation and Inference Techniques.

A General Overview of Parametric Estimation and Inference Techniques. A General Overview of Parametric Estimation and Inference Techniques. Moulinath Banerjee University of Michigan September 11, 2012 The object of statistical inference is to glean information about an underlying

More information

Robust Statistics, Revisited

Robust Statistics, Revisited Robust Statistics, Revisited Ankur Moitra (MIT) joint work with Ilias Diakonikolas, Jerry Li, Gautam Kamath, Daniel Kane and Alistair Stewart CLASSIC PARAMETER ESTIMATION Given samples from an unknown

More information

Paper Review: Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties by Jianqing Fan and Runze Li (2001)

Paper Review: Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties by Jianqing Fan and Runze Li (2001) Paper Review: Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties by Jianqing Fan and Runze Li (2001) Presented by Yang Zhao March 5, 2010 1 / 36 Outlines 2 / 36 Motivation

More information

Physics 509: Bootstrap and Robust Parameter Estimation

Physics 509: Bootstrap and Robust Parameter Estimation Physics 509: Bootstrap and Robust Parameter Estimation Scott Oser Lecture #20 Physics 509 1 Nonparametric parameter estimation Question: what error estimate should you assign to the slope and intercept

More information

Stochastic Proximal Gradient Algorithm

Stochastic Proximal Gradient Algorithm Stochastic Institut Mines-Télécom / Telecom ParisTech / Laboratoire Traitement et Communication de l Information Joint work with: Y. Atchade, Ann Arbor, USA, G. Fort LTCI/Télécom Paristech and the kind

More information

Ordinal optimization - Empirical large deviations rate estimators, and multi-armed bandit methods

Ordinal optimization - Empirical large deviations rate estimators, and multi-armed bandit methods Ordinal optimization - Empirical large deviations rate estimators, and multi-armed bandit methods Sandeep Juneja Tata Institute of Fundamental Research Mumbai, India joint work with Peter Glynn Applied

More information

Sparse Nonparametric Density Estimation in High Dimensions Using the Rodeo

Sparse Nonparametric Density Estimation in High Dimensions Using the Rodeo Outline in High Dimensions Using the Rodeo Han Liu 1,2 John Lafferty 2,3 Larry Wasserman 1,2 1 Statistics Department, 2 Machine Learning Department, 3 Computer Science Department, Carnegie Mellon University

More information

DISCUSSION OF INFLUENTIAL FEATURE PCA FOR HIGH DIMENSIONAL CLUSTERING. By T. Tony Cai and Linjun Zhang University of Pennsylvania

DISCUSSION OF INFLUENTIAL FEATURE PCA FOR HIGH DIMENSIONAL CLUSTERING. By T. Tony Cai and Linjun Zhang University of Pennsylvania Submitted to the Annals of Statistics DISCUSSION OF INFLUENTIAL FEATURE PCA FOR HIGH DIMENSIONAL CLUSTERING By T. Tony Cai and Linjun Zhang University of Pennsylvania We would like to congratulate the

More information

Decentralized Detection In Wireless Sensor Networks

Decentralized Detection In Wireless Sensor Networks Decentralized Detection In Wireless Sensor Networks Milad Kharratzadeh Department of Electrical & Computer Engineering McGill University Montreal, Canada April 2011 Statistical Detection and Estimation

More information

Minimum Hellinger Distance Estimation with Inlier Modification

Minimum Hellinger Distance Estimation with Inlier Modification Sankhyā : The Indian Journal of Statistics 2008, Volume 70-B, Part 2, pp. 0-12 c 2008, Indian Statistical Institute Minimum Hellinger Distance Estimation with Inlier Modification Rohit Kumar Patra Indian

More information

arxiv: v1 [math.st] 13 Sep 2011

arxiv: v1 [math.st] 13 Sep 2011 Methodol Comput Appl Probab manuscript No. (will be inserted by the editor) State-of-the-Art in Sequential Change-Point Detection Aleksey S. Polunchenko Alexander G. Tartakovsky arxiv:119.2938v1 [math.st]

More information

COMPARISON OF THE ESTIMATORS OF THE LOCATION AND SCALE PARAMETERS UNDER THE MIXTURE AND OUTLIER MODELS VIA SIMULATION

COMPARISON OF THE ESTIMATORS OF THE LOCATION AND SCALE PARAMETERS UNDER THE MIXTURE AND OUTLIER MODELS VIA SIMULATION (REFEREED RESEARCH) COMPARISON OF THE ESTIMATORS OF THE LOCATION AND SCALE PARAMETERS UNDER THE MIXTURE AND OUTLIER MODELS VIA SIMULATION Hakan S. Sazak 1, *, Hülya Yılmaz 2 1 Ege University, Department

More information

SEQUENTIAL CHANGE DETECTION REVISITED. BY GEORGE V. MOUSTAKIDES University of Patras

SEQUENTIAL CHANGE DETECTION REVISITED. BY GEORGE V. MOUSTAKIDES University of Patras The Annals of Statistics 28, Vol. 36, No. 2, 787 87 DOI: 1.1214/95367938 Institute of Mathematical Statistics, 28 SEQUENTIAL CHANGE DETECTION REVISITED BY GEORGE V. MOUSTAKIDES University of Patras In

More information

10-704: Information Processing and Learning Fall Lecture 24: Dec 7

10-704: Information Processing and Learning Fall Lecture 24: Dec 7 0-704: Information Processing and Learning Fall 206 Lecturer: Aarti Singh Lecture 24: Dec 7 Note: These notes are based on scribed notes from Spring5 offering of this course. LaTeX template courtesy of

More information

Surveillance of BiometricsAssumptions

Surveillance of BiometricsAssumptions Surveillance of BiometricsAssumptions in Insured Populations Journée des Chaires, ILB 2017 N. El Karoui, S. Loisel, Y. Sahli UPMC-Paris 6/LPMA/ISFA-Lyon 1 with the financial support of ANR LoLitA, and

More information