Scalable Bayesian Event Detection and Visualization

Similar documents
Fast Bayesian scan statistics for multivariate event detection and visualization

Fast Multidimensional Subset Scan for Outbreak Detection and Characterization

Results and Findings. I. Theory of linear time subset scanning

Learning Outbreak Regions in Bayesian Spatial Scan Statistics

Expectation-based scan statistics for monitoring spatial time series data

Detection of Emerging Space-Time Clusters

Detecting Emerging Space-Time Crime Patterns by Prospective STSS

Chapter 6 Classification and Prediction (2)

A Recursive Algorithm for Spatial Cluster Detection

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Yishay Mansour, Lior Wolf

Data Mining: Concepts and Techniques. (3 rd ed.) Chapter 8. Chapter 8. Classification: Basic Concepts

Naïve Bayesian. From Han Kamber Pei

Machine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall

Machine Learning for Signal Processing Bayes Classification and Regression

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Lior Wolf

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.

Quasi-likelihood Scan Statistics for Detection of

Disease mapping with Gaussian processes

STA414/2104 Statistical Methods for Machine Learning II

SaTScan TM. User Guide. for version 7.0. By Martin Kulldorff. August

Introduction to Machine Learning

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS

PAC-learning, VC Dimension and Margin-based Bounds

Curve Fitting Re-visited, Bishop1.2.5

STA 4273H: Statistical Machine Learning

Probabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016

Day 5: Generative models, structured classification

FINAL: CS 6375 (Machine Learning) Fall 2014

Machine Learning, Midterm Exam

L11: Pattern recognition principles

MLE/MAP + Naïve Bayes

Introduction to Machine Learning Midterm Exam

Click Prediction and Preference Ranking of RSS Feeds

Bayesian Machine Learning

The Bayes classifier

Cluster Analysis using SaTScan. Patrick DeLuca, M.A. APHEO 2007 Conference, Ottawa October 16 th, 2007

Support Vector Machines

Bayesian Learning (II)

STA 414/2104, Spring 2014, Practice Problem Set #1

Introduction to Bayesian Learning. Machine Learning Fall 2018

Sum-Product Networks. STAT946 Deep Learning Guest Lecture by Pascal Poupart University of Waterloo October 17, 2017

Machine Learning Overview

Augmented Statistical Models for Speech Recognition

Bayesian Learning. CSL603 - Fall 2017 Narayanan C Krishnan

Introduction to Machine Learning Midterm Exam Solutions

Probabilistic Time Series Classification

Support Vector Machines

SUPERVISED LEARNING: INTRODUCTION TO CLASSIFICATION

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012

Learning Bayesian network : Given structure and completely observed data

Machine Learning, Midterm Exam: Spring 2009 SOLUTION

FleXScan User Guide. for version 3.1. Kunihiko Takahashi Tetsuji Yokoyama Toshiro Tango. National Institute of Public Health

Course in Data Science

CS6220: DATA MINING TECHNIQUES

Detection of Patterns in Water Distribution Pipe Breakage Using Spatial Scan Statistics for Point Events in a Physical Network

Probabilistic modeling. The slides are closely adapted from Subhransu Maji s slides

Protein Complex Identification by Supervised Graph Clustering

A Bayesian Criterion for Clustering Stability

Non-Parametric Bayes

Gaussian Models

Microarray Data Analysis: Discovery

Bayesian Methods: Naïve Bayes

On Bias, Variance, 0/1-Loss, and the Curse-of-Dimensionality. Weiqiang Dong

Data Mining Part 4. Prediction

Bayesian Networks Inference with Probabilistic Graphical Models

An Introduction to SaTScan

Announcements. Proposals graded

Algorithm-Independent Learning Issues

Spatial Clusters of Rates

Uncertainty. Variables. assigns to each sentence numerical degree of belief between 0 and 1. uncertainty

Contents Lecture 4. Lecture 4 Linear Discriminant Analysis. Summary of Lecture 3 (II/II) Summary of Lecture 3 (I/II)

Bayes Classifiers. CAP5610 Machine Learning Instructor: Guo-Jun QI

Spatiotemporal Outbreak Detection

Mining Classification Knowledge

Introduction to Machine Learning. Lecture 2

Eco517 Fall 2004 C. Sims MIDTERM EXAM

Detecting Dark Matter Halos using Principal Component Analysis

CS 231A Section 1: Linear Algebra & Probability Review

Lecture 4 Discriminant Analysis, k-nearest Neighbors

Model Averaging With Holdout Estimation of the Posterior Distribution

CS 231A Section 1: Linear Algebra & Probability Review. Kevin Tang

Computer Vision Group Prof. Daniel Cremers. 2. Regression (cont.)

STA 4273H: Sta-s-cal Machine Learning

Bayesian Hierarchical Models

Probabilistic Graphical Models for Image Analysis - Lecture 1

Intro. ANN & Fuzzy Systems. Lecture 15. Pattern Classification (I): Statistical Formulation

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016

Estimating Covariance Using Factorial Hidden Markov Models

Community Health Needs Assessment through Spatial Regression Modeling

CS340 Winter 2010: HW3 Out Wed. 2nd February, due Friday 11th February

Machine Learning for Signal Processing Bayes Classification

Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project

Dynamic models 1 Kalman filters, linearization,

The exam is closed book, closed notes except your one-page (two sides) or two-page (one side) crib sheet.

Joint Emotion Analysis via Multi-task Gaussian Processes

Introduction: MLE, MAP, Bayesian reasoning (28/8/13)

Machine Learning Linear Classification. Prof. Matteo Matteucci

Generative v. Discriminative classifiers Intuition

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework

Text Categorization CSE 454. (Based on slides by Dan Weld, Tom Mitchell, and others)

Transcription:

Scalable Bayesian Event Detection and Visualization Daniel B. Neill Carnegie Mellon University H.J. Heinz III College E-mail: neill@cs.cmu.edu This work was partially supported by NSF grants IIS-0916345, IIS-0911032, and IIS-0953330. 2010 Carnegie Mellon University 1

Multivariate event detection Daily health data from thousands of hospitals and pharmacies nationwide. Time series of counts c i,mt for each zip code s i for each data stream d m. d 1 = respiratory ED d 2 = constitutional ED d 3 = OTC cough/cold d 4 = OTC anti-fever etc. Given all of this nationwide health data on a daily basis, we want to obtain a complete situational awareness by integrating information from the multiple data streams. More precisely, we have three main goals: to detect any emerging events (i.e. outbreaks of disease), characterize the type of event, and pinpoint the affected areas. 2010 Carnegie Mellon University 2

Expectation-based scan statistics (Kulldorff, 1997; Neill and Moore, 2005) To detect and localize events, we can search for space-time regions where the number of cases is higher than expected. Imagine moving a window around the scan area, allowing the window size, shape, and temporal duration to vary. 2010 Carnegie Mellon University 3

Expectation-based scan statistics (Kulldorff, 1997; Neill and Moore, 2005) To detect and localize events, we can search for space-time regions where the number of cases is higher than expected. Imagine moving a window around the scan area, allowing the window size, shape, and temporal duration to vary. 2010 Carnegie Mellon University 4

Expectation-based scan statistics (Kulldorff, 1997; Neill and Moore, 2005) To detect and localize events, we can search for space-time regions where the number of cases is higher than expected. Imagine moving a window around the scan area, allowing the window size, shape, and temporal duration to vary. 2010 Carnegie Mellon University 5

Expectation-based scan statistics (Kulldorff, 1997; Neill and Moore, 2005) To detect and localize events, we can search for space-time regions where the number of cases is higher than expected. Imagine moving a window around the scan area, allowing the window size, shape, and temporal duration to vary. For each subset of locations, we examine the aggregated time series, and compare actual to expected counts. Time series of past counts Actual counts of last 3 days Expected counts of last 3 days 2010 Carnegie Mellon University 6

Overview of the MBSS method Priors Dataset Models Multivariate Bayesian Scan Statistic Outputs Given a set of event types E k, a set of space-time regions S, and the multivariate dataset D, MBSS outputs the posterior probability Pr(H 1 (S, E k ) D) of each type of event in each region, as well as the probability of no event, Pr(H 0 D). We must provide the prior probability Pr(H 1 (S, E k )) of each event type E k in each region S, as well as the prior probability of no event, Pr(H 0 ). MBSS uses Bayes Theorem to combine the data likelihood given each hypothesis with the prior probability of that hypothesis: Pr(H D) = Pr(D H) Pr(H) / Pr(D). 2010 Carnegie Mellon University 7

The Bayesian hierarchical model Space-time region affected Type of event Effects on each data stream x m = 1 + θ (x km,avg 1) Expected counts Effects of event Parameter priors Relative risks Observed counts c i,mt ~ Poisson(q i,mt b i,mt ) q i,mt ~ Gamma(x m m, m ) inside S, q i,mt ~ Gamma( m, m ) elsewhere 2010 Carnegie Mellon University 8

Interpretation and visualization MBSS gives the total posterior probability of each event type E k, and the distribution of this probability over space-time regions S. Visualization: Pr(H 1 (s i, E k )) = Pr(H 1 (S, E k )) for all regions S containing location s i. Posterior probability map Total posterior probability of a respiratory outbreak in each Allegheny County zip code. Darker shading = higher probability. 2010 Carnegie Mellon University 9

MBSS: advantages and limitations MBSS can detect faster and more accurately by integrating multiple data streams. MBSS can model and differentiate between multiple potential causes of an event. MBSS assumes a uniform prior for circular regions and zero prior for noncircular regions, resulting in low power for elongated or irregular clusters. There are too many subsets of the data (2 N ) to compute likelihoods for all of them! How can we extend MBSS to efficiently detect irregular clusters? 2010 Carnegie Mellon University 10

Hierarchical prior distribution We define a non-uniform prior Pr(H 1 (S, E k )) over all 2 N subsets of the data. This prior has hierarchical structure: 2010 Carnegie Mellon University 11

Hierarchical prior distribution We define a non-uniform prior Pr(H 1 (S, E k )) over all 2 N subsets of the data. This prior has hierarchical structure: 1. Choose the center location s c uniformly at random from {s 1 s N }. 2. Choose the neighborhood size n uniformly at random from {1 n max }. 2010 Carnegie Mellon University 12

Hierarchical prior distribution We define a non-uniform prior Pr(H 1 (S, E k )) over all 2 N subsets of the data. This prior has hierarchical structure: 1. Choose the center location s c uniformly at random from {s 1 s N }. 2. Choose the neighborhood size n uniformly at random from {1 n max }. 3. Choose region S uniformly at random from the 2 n subsets of S cn = {s c and its n 1 nearest neighbors}. This prior distribution has non-zero prior probabilities for any given subset S, but more compact clusters have larger priors. 2010 Carnegie Mellon University 13

Fast Subset Sums (FSS) Naïve computation of posterior probabilities using this prior requires summing over an exponential number of regions, which is infeasible. However, the total posterior probability of an outbreak, Pr(H 1 (E k ) D), and the posterior probability map, Pr(H 1 (s i, E k ) D), can be calculated efficiently without computing the probability of each region S. In the original MBSS method, the likelihood ratio of spatial region S for a given event type E k and event severity θ can be found by multiplying the likelihood ratios LR(s i E k, θ) for all locations s i in S. In FSS, the average likelihood ratio of the 2 n subsets for a given center s c and neighborhood size n can be found by multiplying the quantities ((1 + LR(s i E k, θ)) / 2) for all locations s i in S. Since the prior is uniform for a given center and neighborhood, we can compute the posteriors for each s c and n, and marginalize over them. 2010 Carnegie Mellon University 14

Fast Subset Sums (FSS) Proof sketch: For a given center s c, neighborhood size n, event type E k, severity θ, and temporal window W, we can compute the summed posterior probability of all subsets S cn. S Scn Pr(S D) S Scn Pr(S) LR(S) = (1 / 2 n ) S Scn LR(S) = (1 / 2 n ) S Scn si S LR(s i ). Key step: write sum of 2 n products as product of n sums. = (1 / 2 n ) si Scn (1 + LR(s i)) = si Scn ((1 + LR(s i)) / 2) 2010 Carnegie Mellon University 15

Fast Subset Sums (FSS) Proof sketch: To compute the posterior probability map, we must compute the summed posterior probability of all subsets S cn containing each location s j. S Scn: sj S Pr(S D) S Scn: sj S Pr(S) LR(S) = (1 / 2 n ) S Scn: sj S LR(S) = (1 / 2 n ) S Scn : sj S si S LR(s i ). Key step: write sum of 2 n-1 products as product of n-1 sums. = (1 / 2 n ) LR(s j ) si Scn-{sj} (1 + LR(s i )) = LR(s j ) / (1 + LR(s j )) * S Scn Pr(S D) 2010 Carnegie Mellon University 16

Evaluation We injected simulated disease outbreaks into two streams of Emergency Department data (cough, nausea) from 97 Allegheny County zip codes. Results were computed for ten different outbreak shapes, including compact, elongated, and irregularly-shaped, with 200 injects of each type. We compared FSS to the original MBSS method (searching over circles) in terms of run time, timeliness of detection, proportion of outbreaks detected, and spatial accuracy. 2010 Carnegie Mellon University 17

Computation time We compared the run times of MBSS, FSS, and a naïve subset sums implementation as a function of the maximum neighborhood size n max. Run time of MBSS increased gradually with increasing n max, up to 1.2 seconds per day of data. Run time of Naïve Subset Sums increased exponentially, making it infeasible for n max 25. Run time of FSS scaled quadratically with n max, up to 8.8 seconds per day of data. Thus, while FSS is approximately 7.5x slower than the original MBSS method, it is still extremely fast, computing the posterior probability map for each day of data in under nine seconds. 2010 Carnegie Mellon University 18

Timeliness of detection FSS detected an average of one day earlier than MBSS for maximum temporal window W = 3, and 0.54 days earlier for W = 7, with less than half as many missed outbreaks. Both methods achieve similar detection times for compact outbreak regions. For highly elongated outbreaks, FSS detects 1.3 to 2.2 days earlier, and for irregular regions, FSS detects 0.3 to 1.2 days earlier. 2010 Carnegie Mellon University 19

Spatial accuracy As measured by the average overlap coefficient between true and detected clusters, FSS outperformed MBSS by 10-15%. For elongated and irregular clusters, FSS had much higher precision and recall. For compact clusters, FSS had higher precision, and MBSS had higher recall. 2010 Carnegie Mellon University 20

Posterior probability maps Spatial accuracy of FSS was similar to MBSS for compact clusters. True outbreak region MBSS Fast Subset Sums 64% 59% 63% 55% 36% 43% 2010 Carnegie Mellon University 21

Posterior probability maps FSS had much higher spatial accuracy than MBSS for elongated clusters. True outbreak region MBSS Fast Subset Sums 19% 41% 41% 71% 29% 51% 2010 Carnegie Mellon University 22

Posterior probability maps FSS was better able to capture the shape of irregular clusters. True outbreak region MBSS Fast Subset Sums 29% 35% 78% 79% 59% 74% 2010 Carnegie Mellon University 23

Generalized FSS We have recently developed a generalization of FSS which allows another parameter (sparsity of the detected region) to be controlled. We assume a different hierarchical prior: 1. Choose the center location s c uniformly at random from {s 1 s N }. 2. Choose the neighborhood size n uniformly at random from {1 n max }. 3. For each s i S cn, include s i in S with probability p, for a fixed 0 < p 1. p = 0.5 corresponds to the original FSS approach (all subsets equally likely given the neighborhood). p = 1 corresponds to MBSS (circular regions only). 2010 Carnegie Mellon University 24

Generalized FSS We have recently developed a generalization of FSS which allows another parameter (sparsity of the detected region) to be controlled. We assume a different hierarchical prior: 1. Choose the center location s c uniformly at random from {s 1 s N }. 2. Choose the neighborhood size n uniformly at random from {1 n max }. 3. For each s i S cn, include s i in S with probability p, for a fixed 0 < p 1. Larger p detects more compact clusters; smaller p detects more dispersed clusters. The summed posterior probability S Scn Pr(S D) can be efficiently computed for any p, 0 < p 1. 2010 Carnegie Mellon University 25

Generalized FSS Optimization of the sparsity parameter p can substantially improve the detection performance of the generalized FSS approach. Compact cluster: Detection time minimized at p = 0.5; spatial accuracy maximized at p = 0.7. Highly elongated cluster: Detection time minimized at p = 0.1; spatial accuracy maximized at p = 0.2. p = 0.2 improves detection time by 0.8 days and spatial accuracy by ~10%, as compared to p = 0.5. 2010 Carnegie Mellon University 26

Conclusions FSS shares the essential advantages of MBSS: it can integrate information from multiple data streams, and can accurately distinguish between multiple outbreak types. As compared to the original MBSS method, FSS substantially improves accuracy and timeliness of detection for elongated or irregular clusters, with similar performance for compact clusters. While a naïve computation over the exponentially many subsets of the data is computationally infeasible, FSS can efficiently and exactly compute the posterior probability map. Generalized FSS can further improve detection time and spatial accuracy through incorporation of a sparsity parameter. We can also learn the prior distribution from a small amount of labeled training data, as in Makatchev and Neill (2008). 2010 Carnegie Mellon University 27

References D.B. Neill. Fast Bayesian scan statistics for multivariate event detection and visualization. Statistics in Medicine, 2010, in press. D.B. Neill and G.F. Cooper. A multivariate Bayesian scan statistic for early event detection and characterization. Machine Learning 79: 261-282, 2010. M. Makatchev and D.B. Neill. Learning outbreak regions in Bayesian spatial scan statistics. Proc. ICML Workshop on Machine Learning in Health Care Applications, 2008. D.B. Neill. Incorporating learning into disease surveillance systems. Advances in Disease Surveillance 4: 107, 2007. D.B. Neill, A.W. Moore, and G.F. Cooper. A multivariate Bayesian scan statistic. Advances in Disease Surveillance 2: 60, 2007. D.B. Neill, A.W. Moore, and G.F. Cooper. A Bayesian spatial scan statistic. In Advances in Neural Information Processing Systems 18, 2006. 2010 Carnegie Mellon University 28