Simultaneous and sequential detection of multiple interacting change points

Similar documents
Decentralized decision making with spatially distributed data

Distributed Online Simultaneous Fault Detection for Multiple Sensors

Distributed Online Simultaneous Fault Detection for Multiple Sensors

Data-Efficient Quickest Change Detection

Uncertainty. Jayakrishnan Unnikrishnan. CSL June PhD Defense ECE Department

INFORMATION PROCESSING ABILITY OF BINARY DETECTORS AND BLOCK DECODERS. Michael A. Lexa and Don H. Johnson

arxiv: v2 [eess.sp] 20 Nov 2017

Statistical Inference

Lecture 22: Error exponents in hypothesis testing, GLRT

COMPARISON OF STATISTICAL ALGORITHMS FOR POWER SYSTEM LINE OUTAGE DETECTION

Least Favorable Distributions for Robust Quickest Change Detection

Introduction to Bayesian Statistics

Decentralized Sequential Hypothesis Testing. Change Detection

Sequential Detection. Changes: an overview. George V. Moustakides

Sequential Procedure for Testing Hypothesis about Mean of Latent Gaussian Process

Bayesian Quickest Change Detection Under Energy Constraints

A Novel Asynchronous Communication Paradigm: Detection, Isolation, and Coding

Graph Detection and Estimation Theory

Asymptotically Optimal Quickest Change Detection in Distributed Sensor Systems

Lecture 5: Likelihood ratio tests, Neyman-Pearson detectors, ROC curves, and sufficient statistics. 1 Executive summary

VoI for Learning and Inference! in Sensor Networks!

Chapter 2. Binary and M-ary Hypothesis Testing 2.1 Introduction (Levy 2.1)

Introduction to Signal Detection and Classification. Phani Chavali

Coding into a source: an inverse rate-distortion theorem

BAYESIAN ESTIMATION OF UNKNOWN PARAMETERS OVER NETWORKS

Decentralized Detection In Wireless Sensor Networks

Statistical Models and Algorithms for Real-Time Anomaly Detection Using Multi-Modal Data

Revisiting the Exploration-Exploitation Tradeoff in Bandit Models

Applications of Information Geometry to Hypothesis Testing and Signal Detection

Threshold Considerations in Distributed Detection in a Network of Sensors.

10-704: Information Processing and Learning Fall Lecture 24: Dec 7

Quantifying uncertainty & Bayesian networks

BAYESIAN SEQUENTIAL CHANGE DIAGNOSIS

Accuracy and Decision Time for Decentralized Implementations of the Sequential Probability Ratio Test

Decentralized Sequential Change Detection Using Physical Layer Fusion

Bayesian Learning in Social Networks

EVALUATING SYMMETRIC INFORMATION GAP BETWEEN DYNAMICAL SYSTEMS USING PARTICLE FILTER

Quickest Anomaly Detection: A Case of Active Hypothesis Testing

PATTERN RECOGNITION AND MACHINE LEARNING

2. What are the tradeoffs among different measures of error (e.g. probability of false alarm, probability of miss, etc.)?

Lecture 7 Introduction to Statistical Decision Theory

Surrogate loss functions, divergences and decentralized detection

Hypothesis Testing with Communication Constraints

Distributed Data Fusion with Kalman Filters. Simon Julier Computer Science Department University College London

Lecture 8: Information Theory and Statistics

Space-Time CUSUM for Distributed Quickest Detection Using Randomly Spaced Sensors Along a Path

Optimum CUSUM Tests for Detecting Changes in Continuous Time Processes

Limits, Continuity, and the Derivative

Differentially Private Multi-party Computation

On Information Divergence Measures, Surrogate Loss Functions and Decentralized Hypothesis Testing

Decentralized Detection in Sensor Networks

Statistical inference

Time. Lakshmi Ganesh. (slides borrowed from Maya Haridasan, Michael George)

EECS 750. Hypothesis Testing with Communication Constraints

Detection theory. H 0 : x[n] = w[n]

arxiv: v2 [math.st] 20 Jul 2016

Quantization Effect on the Log-Likelihood Ratio and Its Application to Decentralized Sequential Detection

Introduction to Statistical Inference

CS 188: Artificial Intelligence Fall 2008

Asymptotically Optimal and Bandwith-efficient Decentralized Detection

Decentralized Stochastic Control with Partial Sharing Information Structures: A Common Information Approach

Analysis of DualCUSUM: a Distributed Energy Efficient Algorithm for Change Detection

EEL 851: Biometrics. An Overview of Statistical Pattern Recognition EEL 851 1

Variational Inference (11/04/13)

Lecture 10: Introduction to reasoning under uncertainty. Uncertainty

Approximating mixture distributions using finite numbers of components

Distributed Binary Quantizers for Communication Constrained Large-scale Sensor Networks

Censoring for Type-Based Multiple Access Scheme in Wireless Sensor Networks

Information Theory and Hypothesis Testing

Recent Advances in Bayesian Inference Techniques

Instructor: Dr. Volkan Cevher. 1. Background

Introduction to Machine Learning

CS 188: Artificial Intelligence Fall 2009

COS513 LECTURE 8 STATISTICAL CONCEPTS

Probability Models of Information Exchange on Networks Lecture 5

Scalable robust hypothesis tests using graphical models

Linear Dynamical Systems

Dynamic Data-Driven Adaptive Sampling and Monitoring of Big Spatial-Temporal Data Streams for Real-Time Solar Flare Detection

CHANGE DETECTION WITH UNKNOWN POST-CHANGE PARAMETER USING KIEFER-WOLFOWITZ METHOD

Natural Language Processing. Classification. Features. Some Definitions. Classification. Feature Vectors. Classification I. Dan Klein UC Berkeley

SEQUENTIAL CHANGE-POINT DETECTION WHEN THE PRE- AND POST-CHANGE PARAMETERS ARE UNKNOWN. Tze Leung Lai Haipeng Xing

An Effective Approach to Nonparametric Quickest Detection and Its Decentralized Realization

Fundamental Limits of Invisible Flow Fingerprinting

Lecture 8: Information Theory and Statistics

Probabilistic Reasoning. (Mostly using Bayesian Networks)

Bayesian Network. Outline. Bayesian Network. Syntax Semantics Exact inference by enumeration Exact inference by variable elimination

6196 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 57, NO. 9, SEPTEMBER 2011

Quickest Detection With Post-Change Distribution Uncertainty

Application of Gauge R&R Methods for Validation of Analytical Methods in the Pharmaceutical Industry

Detection Theory. Chapter 3. Statistical Decision Theory I. Isael Diaz Oct 26th 2010

Fast Distribution Grid Line Outage Identification

On optimal quantization rules for some sequential decision problems

Massive Experiments and Observational Studies: A Linearithmic Algorithm for Blocking/Matching/Clustering

g(.) 1/ N 1/ N Decision Decision Device u u u u CP

Machine Learning and Bayesian Inference. Unsupervised learning. Can we find regularity in data without the aid of labels?

Optimal Sensor Placement for Intruder Detection

Compressive and Coded Change Detection: Theory and Application to Structural Health Monitoring

A first model of learning

Artificial Intelligence

Large-Scale Multi-Stream Quickest Change Detection via Shrinkage Post-Change Estimation

Transcription:

Simultaneous and sequential detection of multiple interacting change points Long Nguyen Department of Statistics University of Michigan Joint work with Ram Rajagopal (Stanford University) 1

Introduction Statistical inference in the context of spatially distributed data processed and analyzed by decentralized systems sensor networks, social networks, the Web Two interacting aspects how to exploit the spatial dependence in data how to deal with decentralized communication and computation 2

Introduction Statistical inference in the context of spatially distributed data processed and analyzed by decentralized systems sensor networks, social networks, the Web Two interacting aspects how to exploit the spatial dependence in data how to deal with decentralized communication and computation Extensive literature dealing with each of these two aspects separately by different communities Many applications call for handling both aspects in near real-time data processing and analysis 2-a

Example Sensor network for traffic monitoring Problem: detecting sensor failures for all sensors in the network data: sequence of sensor measurements of traffic volume sequential detection rule for change (failure) point, one for each sensor 3

Mean days to failure as many as 40% sensors fail a given day need to detect failed sensors as early as possible separating sensor failure from events of interest is difficult 4

Talk outline statistical formulation for detection of multiple change points in a network setting classical sequential analysis graphical models sequential and real-time message-passing detection algorithms decision procedures with limited data and computation asymptotic theory of the tradeoffs between statistical efficiency vs. computation/communication efficiency 5

Sequential detection for single change point sensor u collects sequence of data X n (u) for n = 1, 2,... λ u change point variable for sensor u data are i.i.d. according to f 0 before the change point; and iid f 1 after 6

Sequential detection for single change point sensor u collects sequence of data X n (u) for n = 1, 2,... λ u change point variable for sensor u data are i.i.d. according to f 0 before the change point; and iid f 1 after a sequential change point detection procedure is a stopping time τ u, i.e., {τ u n} σ(x 1 (u),...,x n (u)) 6-a

Sequential detection for single change point sensor u collects sequence of data X n (u) for n = 1, 2,... λ u change point variable for sensor u data are i.i.d. according to f 0 before the change point; and iid f 1 after a sequential change point detection procedure is a stopping time τ u, i.e., {τ u n} σ(x 1 (u),...,x n (u)) Neyman-Pearson criterion: constraint on false alarm error PFA(τ u (X)) = P(τ u < λ u ) α for some small α minimum detection delay E[(τ u λ u ) τ u λ u ]. 6-b

Beyond a single change point we have multiple change points, one for each sensor we could apply the single change point method to each sensor independently, but this is not a good idea measurements from a single sensor are very noisy failed sensors may still produce plausible measurement values borrowing information from neighboring sensors may be useful due to spatial dependence of measurements but data sharing limited to neighboring sensors data sharing via a message-passing mechanism 7

Sample correlation with neighbors Correlation with good sensors Correlation with failed sensors 8

Correlation statistics have been successfully utilized in practice, although not in a sequential and decentralized setting (Kwon and Rice, 2003) 9

A formulation for multiple change points m sensors labeled by U = {u 1,...,u m } given a graph G = (U, E) that specifies the the connections among u U each sensor u fails at time λ u λ u is endowed with (independent) prior distribution π u 10

A formulation for multiple change points m sensors labeled by U = {u 1,...,u m } given a graph G = (U, E) that specifies the the connections among u U each sensor u fails at time λ u λ u is endowed with (independent) prior distribution π u there is private data sequence X n (u) for sensor u private data sequence changes its distribution after λ u 10-a

A formulation for multiple change points m sensors labeled by U = {u 1,...,u m } given a graph G = (U, E) that specifies the the connections among u U each sensor u fails at time λ u λ u is endowed with (independent) prior distribution π u there is private data sequence X n (u) for sensor u private data sequence changes its distribution after λ u there is shared data sequence (Z n (u, v)) n for each neighboring pair of sensors u and v: Z n (u, v) iid f 0 ( u, v), for n < min(λ u, λ v ) iid f 1 ( u, v), for n min(λ u, λ v ) 10-b

Graphical model of change points (a) Topology of sensor network (b) Graphical model of random variables Conditionally on the shared data sequences, change point variables are no longer independent 11

Localized stopping times Data constraint. Each sensor has access to only shared data with its neighbors Definition. Stopping rule for u, denoted by τ u, is a localized stopping time, which depends on measurements of u and its neighbors: for any t > 0: {τ u t} σ ( ) {X n (u), Z n (u, v) n t, v N(u)} 12

Performance metrics false alarm rate PFA(τ u ) = P(τ u λ u ). expected failure detection delay D(τ u ) = E[τ u λ u τ u λ u ]. Problem: for each sensor u, find a localized stopping time τ u min τ u D(τ u ) such that PFA(τ u ) α. 13

Review of results for single change point detection optimal sequential rule is a stopping rule by thresholding the posterior of λ u under some conditions: (Shiryaev, 1978) τ u (X) = inf{n : Λ n 1 α}, where Λ n = P(λ u n X 1 (u),...,x n (u)). well-established asymptotic properties (Tartakovsky & Veeravalli, 2006): false alarm: PFA(τ u (X)) α. detection delay: D(τ u (X)) = ( ) log α 1 + o(1) q(x) + d as α 0. here q(x) = KL(f 1 (X) f 0 (X)), the Kullback-Leibler information, d some constant 14

Two sensor case: An initial idea λ u λ v X u Z v Y X Z Y Idea: use both private data X 1,...,X n and shared data Z 1,...,Z n : τ u (X, Z) = inf{n : P(λ u n (X 1, Z 1 ),..., (X n, Z n )) 1 α}. 15

Two sensor case: An initial idea λ u λ v X u Z v Y X Z Y Idea: use both private data X 1,...,X n and shared data Z 1,...,Z n : τ u (X, Z) = inf{n : P(λ u n (X 1, Z 1 ),..., (X n, Z n )) 1 α}. Theorem 1: The false alarm for τ u (X, Z) is bounded from above by α, while expected delay takes the form: D(τ u (X, Z)) = log α q(x) + d «1 + o(1) as α 0. Z not helpful in improving the delay (at least in the asymptotics!) this suggests to use information from Y as well (to predict λ u ) 15-a

Localized stopping rule with message exchange Modified Idea: u should use information given by shared data Z only if its neighbor v has not changed (failed)... but u does not know whether v has changed or not, so... instead of deciding this by itself, u will wait for v to tell it X u Z v Y message 16

Localized stopping rule with message exchange Modified Idea: u should use information given by shared data Z only if its neighbor v has not changed (failed)... but u does not know whether v has changed or not, so... instead of deciding this by itself, u will wait for v to tell it X u Z v Y message Stopping rule for u ultimately hinges also information given by data sequence Y, passed to u indirectly via neighbor sensor v 16-a

Localized stopping rule with information exchange Algorithmic Protocol: each sensor uses all data shared with neighbors that have not declared to change (fail) if a sensor v stops according to its stopping rule, v broadcasts this information to all its neighbors, who promptly drop v from the list of their respective neighbors 17

Localized stopping rule with information exchange Algorithmic Protocol: each sensor uses all data shared with neighbors that have not declared to change (fail) if a sensor v stops according to its stopping rule, v broadcasts this information to all its neighbors, who promptly drop v from the list of their respective neighbors Formally, for two sensors: stopping rule for u, using only X: τ u (X) stopping rule for u, using both X and Z: τ u (X, Z) similarly, for sensor v: τ v (Y ) and τ v (Y, Z) then, the overall stopping rule for u is: 8 < τ u (X, Z) if τ u (X, Z) τ v (Y, Z) τ u (X, Y, Z) = : max(τ u (X), τ v (Y, Z)) otherwise 17-a

Asymptotic expression of detection delay (Rajagopal, Nguyen, Ergen & Varaiya, 2010) Theorem 2: Expected detection delay for u takes the form: D( τ u ) = D 1 δ α + D 2 (1 δ α ) as α 0. here, D 1 = D(τ u (X)) = D 2 = log α q(x) + q(z) + d log α q(x) + d ( 1 + o(1) ( ) 1 + o(1), ) < D 1. δ α is the probability that u s neighbor declares fail before u. clearly, for sufficiently small α there holds: D( τ u ) < D(τ u (X)). Under additional conditions, this delay is asymptotically optimal. 18

Upper bound for false alarm rate Theorem 3: False alarm rate for τ u satisfies: PFA( τ u ) α + ξ( τ u ). ξ( τ u ) is termed error-coupling probability: probability that u thinks v has not changed, while in fact, v already has: ξ( τ u ) = P( τ u τ v, λ v τ u λ u ). 19

Upper bound for false alarm rate Theorem 3: False alarm rate for τ u satisfies: PFA( τ u ) α + ξ( τ u ). ξ( τ u ) is termed error-coupling probability: probability that u thinks v has not changed, while in fact, v already has: ξ( τ u ) = P( τ u τ v, λ v τ u λ u ). Moreover, ξ( τ u ) 0 at a rate that is faster than α p for some constant p > 0. p > 1 under conditions that the Kullback-Leibler information given by shared data Z are sufficiently dominated by that of private data X and Y. 19-a

Power rate of error-coupling probability Define b = q 0 (X) q 1 (Z) + d and the rate r a = 1 w [min{q 0 (X), q 1 (Z)} + q 1 (Y )] 2 max{σ 2 0 (X), σ2 1 (Z)} + σ2 1 (Y ), where w σ1 2 = (X) + σ2 1 (Z) max{σ0 2(X), σ2 1 (Z)} + σ2 1 (Y )[min{q 0(X), q 1 (Z)} + q 1 (Y )] b, constants σ 2 0(X), σ 2 1(Z) and σ 2 1(Y ) are variances of the likelihood ratios. Then lim α 0 (a) if b 1 0 then p = r a; log ξ( τ u ) log α p, where (b) if b 1 > 0 then p = max(r a, r b ), where r b = 4b σ 2 1 (X)+σ2 1 (Z). 20

Simulation set-up f 0 (X) = N(1, σ 2 (X)); f 1 (X) = N(0, σ 2 (X)) f 0 (Y ) = N(1, σ 2 (Y )); f 1 (Y ) = N(0, σ 2 (Y )) f 0 (Z) = N(1, σ 2 (Z)); f 1 (Z) = N(0, σ 2 (Z)) Change points λ 1 and λ 2 are endowed with geometric priors and simulated accordingly 21

Benefits of message-passing with shared data/information Two-sensor network: left: evaluated by simulations right: predicted by Theorem 2 X-axis: Ratio of uncertainty σ 2 (Z)/σ 2 (X) Y-axis: Detection delay time 22

There is extra loss in terms of false alarm probability: PFA( τ u ) α + α p. where p > 1 if σ 2 Z /σ2 X > 3 (by Theorem 2). By simulation p > 1 if σ 2 Z /σ2 X > 1.8. 23

Network with many sensors our algorithmic protocol is readily applicable to network with arbitrarily number of sensors and arbitrary topology The Algorithmic Protocol: each sensor uses all data shared with neighbors that have not declared to change (fail) if a sensor v stops according to its stopping rule, v broadcasts this information to all its neighbors, who promptly drop v from the list of their respective neighbors asymptotic theory for the false alarm probability remains open comparison of stopping times is intricate 24

Examples of network topologies (a) Grid network (b) Fully connected network 25

Number of sensors vs Detection delay time Fully connected network: left: α =.1 right: α = 10 4 (theory predicts well!) 26

False alarm rates Fully connected network simulated false alarm rate vs. actual rate number of sensors vs. actual rate 27

Effects of network topology Grid network (each sensor has fixed number of neighbors) num. of sensors vs. detection delay num. of sensors vs. actual FA rate 28

Summary decentralized sequential detection of multiple change points application to detection failures in a sensor network new statistical formulation drawing from classical ideas: sequential analysis probabilistic graphical models introduced a message-passing sequential detection algorithm, exploiting the benefit of network information asymptotic theories for analyzing false alarm rates and detection delay 29

Acknowledgement: Ram Rajagopal (Stanford University), Sinem Coleri Ergen (Koc University) and Pravin Varaiya (UC Berkeley) for more detail, see R. Rajagopal, X. Nguyen, S.C. Ergen and P. Varaiya. Simultaneous sequential detection of multiple interacting faults. http://arxiv.org/abs/1012.1258 30