UAPD: Predicting Urban Anomalies from Spatial-Temporal Data

Similar documents
UAPD: Predicting Urban Anomalies from Spatial-Temporal Data

Modeling Temporal-Spatial Correlations for Crime Prediction

Diagnosing New York City s Noises with Ubiquitous Data

Believe it Today or Tomorrow? Detecting Untrustworthy Information from Dynamic Multi-Source Data

Temporal Multi-View Inconsistency Detection for Network Traffic Analysis

26 : Spectral GMs. Lecturer: Eric P. Xing Scribes: Guillermo A Cidre, Abelino Jimenez G.

Scaling Neighbourhood Methods

Profiling and Prediction of Non-Emergency Calls in New York City

Discrete Latent Variable Models

Crowdsourcing via Tensor Augmentation and Completion (TAC)

Incorporating Spatio-Temporal Smoothness for Air Quality Inference

Inferring Friendship from Check-in Data of Location-Based Social Networks

Style-aware Mid-level Representation for Discovering Visual Connections in Space and Time

Tutorial: Urban Trajectory Visualization. Case Studies. Ye Zhao

Weather to Climate Investigation: Snow

The connection of dropout and Bayesian statistics

AN OVERVIEW OF ENSEMBLE STREAMFLOW PREDICTION STUDIES IN KOREA

A Variance Modeling Framework Based on Variational Autoencoders for Speech Enhancement

STA 4273H: Statistical Machine Learning

VCMC: Variational Consensus Monte Carlo

Predictive Analytics on Accident Data Using Rule Based and Discriminative Classifiers

Nonparametric Bayesian Methods (Gaussian Processes)

A Randomized Approach for Crowdsourcing in the Presence of Multiple Views

Review: Probabilistic Matrix Factorization. Probabilistic Matrix Factorization (PMF)

Advanced Techniques for Mining Structured Data: Process Mining

NRCSE. Misalignment and use of deterministic models

Learning a probabalistic model of rainfall using graphical models

Urban Computing Using Big Data to Solve Urban Challenges

Deep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, Spis treści

* Matrix Factorization and Recommendation Systems

Dynamic Data Modeling, Recognition, and Synthesis. Rui Zhao Thesis Defense Advisor: Professor Qiang Ji

NOWADAYS, Collaborative Filtering (CF) [14] plays an

Probabilistic Graphical Models

STA 4273H: Sta-s-cal Machine Learning

A Constraint Generation Approach to Learning Stable Linear Dynamical Systems

STA 4273H: Statistical Machine Learning

Data Mining Techniques

Deep Learning Basics Lecture 7: Factor Analysis. Princeton University COS 495 Instructor: Yingyu Liang

Integrated Electricity Demand and Price Forecasting

statistical methods for tailoring seasonal climate forecasts Andrew W. Robertson, IRI

Discovering Geographical Topics in Twitter

BIOSTATS Intermediate Biostatistics Spring 2015 Class Activity Unit 1: Review of BIOSTATS 540

Application of eigenvector-based spatial filtering approach to. a multinomial logit model for land use data

Oak Ridge Urban Dynamics Institute

1 Probabilities. 1.1 Basics 1 PROBABILITIES

Exploring Human Mobility with Multi-Source Data at Extremely Large Metropolitan Scales. ACM MobiCom 2014, Maui, HI

A computationally efficient approach to generate large ensembles of coherent climate data for GCAM

Matrix Factorization Techniques For Recommender Systems. Collaborative Filtering

Operational modal analysis using forced excitation and input-output autoregressive coefficients

Road Surface Condition Analysis from Web Camera Images and Weather data. Torgeir Vaa (SVV), Terje Moen (SINTEF), Junyong You (CMR), Jeremy Cook (CMR)

Lecture 21: Spectral Learning for Graphical Models

Bayesian Hierarchical Models

A Brief Overview of Machine Learning Methods for Short-term Traffic Forecasting and Future Directions

Energy Based Models. Stefano Ermon, Aditya Grover. Stanford University. Lecture 13

Information Dynamics Foundations and Applications

Climate Dataset: Aitik Closure Project. November 28 th & 29 th, 2018

Bayesian Networks in Educational Assessment

Time Series Data Cleaning

Towards Fully-automated Driving

Machine Learning Techniques for Computer Vision

Data Mining Techniques

Gaussian Process Regression with K-means Clustering for Very Short-Term Load Forecasting of Individual Buildings at Stanford

Geostatistics and Spatial Scales

Clustering Techniques and their applications at ECMWF

Deep Convolutional Neural Networks for Pairwise Causality

1 Probabilities. 1.1 Basics 1 PROBABILITIES

APPENDIX I: Traffic Forecasting Model and Assumptions

STOCHASTIC MODELING OF MONTHLY RAINFALL AT KOTA REGION

Learning Chaotic Dynamics using Tensor Recurrent Neural Networks

Museumpark Revisit: A Data Mining Approach in the Context of Hong Kong. Keywords: Museumpark; Museum Demand; Spill-over Effects; Data Mining

Kernel-Based Principal Component Analysis (KPCA) and Its Applications. Nonlinear PCA

Regularity and Conformity: Location Prediction Using Heterogeneous Mobility Data

DROUGHT INDICES BEING USED FOR THE GREATER HORN OF AFRICA (GHA)

Bayesian Inference and MCMC

RaRE: Social Rank Regulated Large-scale Network Embedding

Restricted Boltzmann Machines for Collaborative Filtering

peak half-hourly Tasmania

peak half-hourly New South Wales

Global Behaviour Inference using Probabilistic Latent Semantic Analysis

arxiv: v1 [cs.lg] 3 Jan 2017

On the energy demands of small appliances in homes

Understanding Travel Time to Airports in New York City Sierra Gentry Dominik Schunack

Density Estimation. Seungjin Choi

Notes on Markov Networks

Recurrent Latent Variable Networks for Session-Based Recommendation

16 : Approximate Inference: Markov Chain Monte Carlo

Sequential Recommender Systems

Aspect Term Extraction with History Attention and Selective Transformation 1

Andriy Mnih and Ruslan Salakhutdinov

Overview of Achievements October 2001 October 2003 Adrian Raftery, P.I. MURI Overview Presentation, 17 October 2003 c 2003 Adrian E.

Welcome Survey getting to know you Collect & log Supplies received Classroom Rules Curriculum overview. 1 : Aug 810. (3 days) 2nd: Aug (5 days)

Lesson Adaptation Activity: Analyzing and Interpreting Data

DETECTING HUMAN ACTIVITIES IN THE ARCTIC OCEAN BY CONSTRUCTING AND ANALYZING SUPER-RESOLUTION IMAGES FROM MODIS DATA INTRODUCTION

Electric Load Forecasting Using Wavelet Transform and Extreme Learning Machine

Probabilistic Graphical Models

STA 414/2104: Lecture 8

Principal Component Analysis (PCA) of AIRS Data

STAT 518 Intro Student Presentation

Click Prediction and Preference Ranking of RSS Feeds

Decision-making, inference, and learning theory. ECE 830 & CS 761, Spring 2016

Transcription:

UAPD: Predicting Urban Anomalies from Spatial-Temporal Data Xian Wu, Yuxiao Dong, Chao Huang, Jian Xu, Dong Wang and Nitesh V. Chawla* Department of Computer Science and Engineering University of Notre Dame ECML/PKDD 17, SKOPJE, Macedonia 1

Data Mining in Urban Scenario User-Generated Data 2

Spatial-Temporal Data User-Generated Spatial-Temporal Data in Urban Scenario Spatial-Temporal Data Generated From Crowds Spatial Dimension Latitude Longitude Temporal Dimension Timestamp Information 3

Urban Anomalies Urban anomalies such as noise complaints and blocked driveway negatively affect our everyday life and need to be addressed in a timely manner Noise Blocked Driveway Illegal Parking Non-emergency citizen report system 4

Urban Anomaly Prediction Urban Anomaly Data Reported from Crowds Urban Region <Anomaly Category, Timestamp, Latitude, Longitude> <Anomaly Category, Timestamp, Region ID> Region Region Region Region The Next Time Slot 0 -> not happen 1 -> happen Category 1 0/1 Category 2 0/1 Category K-1 0/1 Category K 0/1 5

Anomaly Dynamics Challenges The factors underlying urban anomalies may change over time. For example, anomalies in the winter may stem from winter related severe weather (e.g., snow), and it may be infeasible to train the predictive model by using historical data between Spring and Fall. Anomaly distribution in the first time period Anomaly distribution in the second time period Different Coupled Multi-Dimensional Correlations Category 1 Category 2 Category K-1 Category K Latent Dependency Different Categories Latent Dependency 6

Problem Formulation Given the historical urban anomaly data Three-Dimension Tensor I J K The number of regions The number of time slots The number of anomaly categori The goal is to learn A Predictive Function: Infer future occurrences of each category k of anomalies at each region i at time J+1. 7

The UAPD Framework 8

Change Point Detection Propose a probabilistic model, whose parameters are inferred via Markov chain Monte Carlo (MCMC), to detect the change point of the historical urban data records in each region of a city. Start Point Separated by the detected point the two sequences follow different data distributions. Most relevant reports are used for the prediction of future urban data. 9

Tensor Decomposition There may exist strong correlated among the locations, time, and categories of occurred urban anomalies Start Point Factorize the tensor into three different sub-matrices: Spatial Dimension Temporal Dimension Category Dimension 10

Vector Auto-regression Based on the three matrices generated by CP decomposition, we formulate the anomaly prediction problem as the time series prediction task Vector Autoregression (VAR) Capture the linear inter dependencies of inherent factors among multiple time series. The order S of VAR Represent the time series in the previous S time slot The number of anomalies with k-th category in region Ri in the next time slot can be derived as: 11

Evaluation 311 Service is a urban anomaly report platform --------- allows citizens to report complaints about urban anomalies Data Statistics 12

Data Visualization (a) Noise (b) Blocked driveway (c) Illegal parking (d) Building/Use 13

Region Partition Experimental Setting High-Level Region Fine-Grained Region NYC Precinct map Use major roads to partition the entire city Time Slot Variation J Time Slots J +1?? How does UAPD perform in anomaly prediction with respect to different time-frames J 14

Evaluation Results (Accuracy-1/5) Prediction Results on Jun with High-Level Region in NYC 15

Evaluation Results (Accuracy-2/5) Prediction Results on Dec with High-Level Region in NYC 16

Evaluation Results (Accuracy-3/5) Prediction Results on June with Fine-grained Region in NYC 17

Evaluation Results (Accuracy-4/5) Prediction Results on Dec with Fine-grained Region in NYC 18

Evaluation Results (Accuracy-5/5) Prediction Results on Jun with High-Level Region in Pittsburgh 19

Evaluation Results (Parameter Change) Performance w.r.t Rank Parameter L on Jun. with high-level regions in NYC Performance w.r.t Rank Parameter L on Dec. with high-level regions in NYC 20

Analysis of Change Point Detection The change point detection results on Pittsburgh datasets are shown as follows: We can observe that the detected starting point of anomaly sequences is Feb 18, 2016 instead of the actual beginning time (i.e., Jan 1, 2016) 21

Conclusion ---- We develop a Urban Anomaly PreDiction (UAPD ) framework to predict urban anomalies from spatial-temporal data. ---- UAPD explicitly detects the change point of the anomaly sequences and also explores the time-evolving inherent factors and their relationships with each dimension tensor (i.e., regions and anomaly categories). ---- We evaluate our presented framework on two sets of urban anomaly reports collected from 311 Service in New York City and Pittsburgh, respectively. The results show that UAPD significantly outperforms state-of-the-art baselines. 22

Thank You! The Interdisciplinary Center for Network Science & Applications (icensa) http://icensa.com/ nchawla@nd.edu xwu9@nd.edu 23