UAPD: Predicting Urban Anomalies from Spatial-Temporal Data Xian Wu, Yuxiao Dong, Chao Huang, Jian Xu, Dong Wang and Nitesh V. Chawla* Department of Computer Science and Engineering University of Notre Dame ECML/PKDD 17, SKOPJE, Macedonia 1
Data Mining in Urban Scenario User-Generated Data 2
Spatial-Temporal Data User-Generated Spatial-Temporal Data in Urban Scenario Spatial-Temporal Data Generated From Crowds Spatial Dimension Latitude Longitude Temporal Dimension Timestamp Information 3
Urban Anomalies Urban anomalies such as noise complaints and blocked driveway negatively affect our everyday life and need to be addressed in a timely manner Noise Blocked Driveway Illegal Parking Non-emergency citizen report system 4
Urban Anomaly Prediction Urban Anomaly Data Reported from Crowds Urban Region <Anomaly Category, Timestamp, Latitude, Longitude> <Anomaly Category, Timestamp, Region ID> Region Region Region Region The Next Time Slot 0 -> not happen 1 -> happen Category 1 0/1 Category 2 0/1 Category K-1 0/1 Category K 0/1 5
Anomaly Dynamics Challenges The factors underlying urban anomalies may change over time. For example, anomalies in the winter may stem from winter related severe weather (e.g., snow), and it may be infeasible to train the predictive model by using historical data between Spring and Fall. Anomaly distribution in the first time period Anomaly distribution in the second time period Different Coupled Multi-Dimensional Correlations Category 1 Category 2 Category K-1 Category K Latent Dependency Different Categories Latent Dependency 6
Problem Formulation Given the historical urban anomaly data Three-Dimension Tensor I J K The number of regions The number of time slots The number of anomaly categori The goal is to learn A Predictive Function: Infer future occurrences of each category k of anomalies at each region i at time J+1. 7
The UAPD Framework 8
Change Point Detection Propose a probabilistic model, whose parameters are inferred via Markov chain Monte Carlo (MCMC), to detect the change point of the historical urban data records in each region of a city. Start Point Separated by the detected point the two sequences follow different data distributions. Most relevant reports are used for the prediction of future urban data. 9
Tensor Decomposition There may exist strong correlated among the locations, time, and categories of occurred urban anomalies Start Point Factorize the tensor into three different sub-matrices: Spatial Dimension Temporal Dimension Category Dimension 10
Vector Auto-regression Based on the three matrices generated by CP decomposition, we formulate the anomaly prediction problem as the time series prediction task Vector Autoregression (VAR) Capture the linear inter dependencies of inherent factors among multiple time series. The order S of VAR Represent the time series in the previous S time slot The number of anomalies with k-th category in region Ri in the next time slot can be derived as: 11
Evaluation 311 Service is a urban anomaly report platform --------- allows citizens to report complaints about urban anomalies Data Statistics 12
Data Visualization (a) Noise (b) Blocked driveway (c) Illegal parking (d) Building/Use 13
Region Partition Experimental Setting High-Level Region Fine-Grained Region NYC Precinct map Use major roads to partition the entire city Time Slot Variation J Time Slots J +1?? How does UAPD perform in anomaly prediction with respect to different time-frames J 14
Evaluation Results (Accuracy-1/5) Prediction Results on Jun with High-Level Region in NYC 15
Evaluation Results (Accuracy-2/5) Prediction Results on Dec with High-Level Region in NYC 16
Evaluation Results (Accuracy-3/5) Prediction Results on June with Fine-grained Region in NYC 17
Evaluation Results (Accuracy-4/5) Prediction Results on Dec with Fine-grained Region in NYC 18
Evaluation Results (Accuracy-5/5) Prediction Results on Jun with High-Level Region in Pittsburgh 19
Evaluation Results (Parameter Change) Performance w.r.t Rank Parameter L on Jun. with high-level regions in NYC Performance w.r.t Rank Parameter L on Dec. with high-level regions in NYC 20
Analysis of Change Point Detection The change point detection results on Pittsburgh datasets are shown as follows: We can observe that the detected starting point of anomaly sequences is Feb 18, 2016 instead of the actual beginning time (i.e., Jan 1, 2016) 21
Conclusion ---- We develop a Urban Anomaly PreDiction (UAPD ) framework to predict urban anomalies from spatial-temporal data. ---- UAPD explicitly detects the change point of the anomaly sequences and also explores the time-evolving inherent factors and their relationships with each dimension tensor (i.e., regions and anomaly categories). ---- We evaluate our presented framework on two sets of urban anomaly reports collected from 311 Service in New York City and Pittsburgh, respectively. The results show that UAPD significantly outperforms state-of-the-art baselines. 22
Thank You! The Interdisciplinary Center for Network Science & Applications (icensa) http://icensa.com/ nchawla@nd.edu xwu9@nd.edu 23