A MULTISCALE APPROACH TO DETECT SPATIAL-TEMPORAL OUTLIERS

Similar documents
An Entropy-based Method for Assessing the Number of Spatial Outliers

Spatial Data Science. Soumya K Ghosh

IV Course Spring 14. Graduate Course. May 4th, Big Spatiotemporal Data Analytics & Visualization

Using MERIS and MODIS for Land Cover Mapping in the Netherlands

A bottom-up strategy for uncertainty quantification in complex geo-computational models

Tri clustering: a novel approach to explore spatio temporal data cubes

Exploring Spatial Relationships for Knowledge Discovery in Spatial Data

Summary Visualizations for Coastal Spatial Temporal Dynamics

CONCEPTUAL DEVELOPMENT OF AN ASSISTANT FOR CHANGE DETECTION AND ANALYSIS BASED ON REMOTELY SENSED SCENES

Application of Clustering to Earth Science Data: Progress and Challenges

DANIEL WILSON AND BEN CONKLIN. Integrating AI with Foundation Intelligence for Actionable Intelligence

Supplementary material: Methodological annex

Clustering Analysis of London Police Foot Patrol Behaviour from Raw Trajectories

Modeling of Atmospheric Effects on InSAR Measurements With the Method of Stochastic Simulation

Multivariate Analysis of Crime Data using Spatial Outlier Detection Algorithm

Overview key concepts and terms (based on the textbook Chang 2006 and the practical manual)

SPATIAL DATA MINING. Ms. S. Malathi, Lecturer in Computer Applications, KGiSL - IIM

Wavelet Fuzzy Classification for Detecting and Tracking Region Outliers in Meteorological Data

Uncertainty in Spatial Data Mining and Its Propagation

Quality Assessment and Uncertainty Handling in Uncertainty-Based Spatial Data Mining Framework

Cell-based Model For GIS Generalization

Soft Spatial Query Processing in Spatial Databases-A Case Study

Data Fusion and Multi-Resolution Data

FINDING SPATIAL UNITS FOR LAND USE CLASSIFICATION BASED ON HIERARCHICAL IMAGE OBJECTS

Great Basin. Location: latitude to 42 N, longitude to W Grid size: 925 m. highest elevation)

USING CLUSTERING SOFTWARE FOR EXPLORING SPATIAL AND TEMPORAL PATTERNS IN NON-COMMUNICABLE DISEASES

VISUAL ANALYTICS APPROACH FOR CONSIDERING UNCERTAINTY INFORMATION IN CHANGE ANALYSIS PROCESSES

Mining Geo-Referenced Databases: A Way to Improve Decision-Making

Automatic Classification of Location Contexts with Decision Trees

Geospatial data pre-processing on watershed datasets: A GIS approach

Spatial Process VS. Non-spatial Process. Landscape Process

Mapping Landscape Change: Space Time Dynamics and Historical Periods.

Geographic Information Systems (GIS) in Environmental Studies ENVS Winter 2003 Session III

KNOWLEDGE-BASED CLASSIFICATION OF LAND COVER FOR THE QUALITY ASSESSEMENT OF GIS DATABASE. Israel -

Chapter 2 Spatial and Spatiotemporal Big Data Science

1 Agricultural University of Athens, 2 Agricultural University of Athens,

A General Framework for Conflation

Vegetation Change Detection of Central part of Nepal using Landsat TM

43400 Serdang Selangor, Malaysia Serdang Selangor, Malaysia 4

Using co-clustering to analyze spatio-temporal patterns: a case study based on spring phenology

The Building Blocks of the City: Points, Lines and Polygons

GIS Data Structure: Raster vs. Vector RS & GIS XXIII

Chapter 6 Spatial Analysis

Geography 38/42:376 GIS II. Topic 1: Spatial Data Representation and an Introduction to Geodatabases. The Nature of Geographic Data

Models to carry out inference vs. Models to mimic (spatio-temporal) systems 5/5/15

Geospatial Visual Analytics and Geovisualization in VisMaster (WP3.4)

A framework for spatio-temporal clustering from mobile phone data

Mining and Visualizing Spatial Interaction Patterns for Pandemic Response

Spatio-Temporal outliers detection within the space-time framework

Parameter selection for region-growing image segmentation algorithms using spatial autocorrelation

Spatial Co-location Patterns Mining

Web Visualization of Geo-Spatial Data using SVG and VRML/X3D

AN APPROACH TO BUILDING GROUPING BASED ON HIERARCHICAL CONSTRAINTS

Anomaly Detection. Jing Gao. SUNY Buffalo

Spatial Decision Tree: A Novel Approach to Land-Cover Classification

International Journal of Remote Sensing, in press, 2006.

SASI Spatial Analysis SSC Meeting Aug 2010 Habitat Document 5

The Importance of Spatial Literacy

Lecture 1: Geospatial Data Models

A DATA FIELD METHOD FOR URBAN REMOTELY SENSED IMAGERY CLASSIFICATION CONSIDERING SPATIAL CORRELATION

Land Cover Classification Over Penang Island, Malaysia Using SPOT Data

Spatial Mining Process Study on Implementation of Instance in Database Engine

Michael Harrigan Office hours: Fridays 2:00-4:00pm Holden Hall

1. Introduction. Chaithanya, V.V. 1, Binoy, B.V. 2, Vinod, T.R. 2. Publication Date: 8 April DOI: /cloud.ijarsg.

Methodological Chain for Hydrological Management with Web-GIS Applications

Chapter 6. Fundamentals of GIS-Based Data Analysis for Decision Support. Table 6.1. Spatial Data Transformations by Geospatial Data Types

A route map to calibrate spatial interaction models from GPS movement data

The Looming Threat of Rising Sea Levels to the Florida Keys

IAEG-SDGs WGGI Task Team Dec. 7, 2017, New York

GEOGRAPHY 350/550 Final Exam Fall 2005 NAME:

Developing integrated remote sensing and GIS procedures for oil spill monitoring on the Libyan coast

IMAGE CLASSIFICATION TOOL FOR LAND USE / LAND COVER ANALYSIS: A COMPARATIVE STUDY OF MAXIMUM LIKELIHOOD AND MINIMUM DISTANCE METHOD

Exploring Kimberley Bushfires in Space and Time

Fuzzy Geographically Weighted Clustering

Integrating Imagery and ATKIS-data to Extract Field Boundaries and Wind Erosion Obstacles

Exploring Climate Patterns Embedded in Global Climate Change Datasets

Object-based feature extraction of Google Earth Imagery for mapping termite mounds in Bahia, Brazil

Raster Spatial Analysis Specific Theory

RESEARCH ON THE DISTRIBUTED PARALLEL SPATIAL INDEXING SCHEMA BASED ON R-TREE

A VECTOR AGENT APPROACH TO EXTRACT THE BOUNDARIES OF REAL-WORLD PHENOMENA FROM SATELLITE IMAGES

The role of topological outliers in the spatial analysis of georeferenced social media data

Rio Santa Geodatabase Project

Construction and Analysis of Climate Networks

Urban Geo-Informatics John W Z Shi

MULTI-SOURCE IMAGE CLASSIFICATION

7.1 INTRODUCTION 7.2 OBJECTIVE

Chapter 5. GIS The Global Information System

These modules are covered with a brief information and practical in ArcGIS Software and open source software also like QGIS, ILWIS.

Uta Gjertsen 1 and Günther Haase 2. Norwegian Meteorological Institute, Oslo, Norway

UNCERTAINTY AND ERRORS IN GIS

Basics of GIS. by Basudeb Bhatta. Computer Aided Design Centre Department of Computer Science and Engineering Jadavpur University

Detecting Region Outliers in Meteorological Data

Spatial Hierarchies and Topological Relationships in the Spatial MultiDimER model

VISUALIZATION URBAN SPATIAL GROWTH OF DESERT CITIES FROM SATELLITE IMAGERY: A PRELIMINARY STUDY

ECML PKDD Discovery Challenges 2017

3 coastal landscapes in the Netherlands. Reconstruction c. 800 A.D.

Formalization of GIS functionality

INTERNATIONAL JOURNAL OF GEOMATICS AND GEOSCIENCES Volume 2, No 1, 2011

LAND USE MAPPING AND MONITORING IN THE NETHERLANDS (LGN5)

Rare Event Discovery And Event Change Point In Biological Data Stream

Transcription:

A MULTISCALE APPROACH TO DETECT SPATIAL-TEMPORAL OUTLIERS Tao Cheng Zhilin Li Department of Land Surveying and Geo-Informatics The Hong Kong Polytechnic University Hung Hom, Kowloon, Hong Kong Email: {lstc; lszlli}@polyu.edu.hk WG: THS9 Uncertainty, Consistency and Accuracy of Data and Imagery Keywords: Detection, Dynamic, Multitemporal, Multiresolution, Thematic, Quality ABSTRACT A spatial outlier is a spatial referenced object whose non-spatial attribute values are significantly different from those of other spatially referenced objects in its spatial neighborhood. It represents locations that are significantly different from their neighborhoods even though they may not be significantly different from the entire population. Here we adopt this definition to spatio-temporal domain and define a spatial-temporal outlier (STO) to be a spatial-temporal referenced object whose thematic attribute values are significantly different from those of other spatially and temporally referenced objects in its spatial or/and temporal neighborhood. Identification of STOs can lead to the discovery of unexpected, interesting, and implicit knowledge, such as local instability. Many methods have been recently proposed to detect spatial outliers, but how to detect the temporal outliers or spatial-temporal outliers has been seldom discussed. In this paper we propose a multiscale approach to detect the STOs by evaluating the change between consecutive spatial and temporal scales. 1. INTRODUCTION Outliers are data that appear inconsistent with respect to the remainder of the database (Barnett and Lewis, 1994). While in many cases these can be anomalies or noise, sometimes these represent rare or unusual events to be investigated further. In general, there are three direct approaches for outlier detection: distribution-based, depth-based and distance-based. Distribution-based approaches use standard statistical distribution, depth-based technique map data objects into an m-dimensional information space (where m is the number of attribute) and distance-based approaches calculate the proportion of database objects that are a specified distance from a target object (Ng 2001). A spatial outlier is a spatial referenced object whose nonspatial attribute values are significantly different from those of other spatially referenced objects in its spatial neighborhood. It represents locations that are significantly different from their neighborhoods even though they may not be significantly different from the entire population (Shekhar, et al, 2003). Identification of spatial outliers can lead to the discovery of unexpected, interesting, and implicit knowledge, such as local instability. Many methods have been recently proposed to detect spatial outliers by the distribution-based approach. These methods can be broadly classified into two categories, namely 1-D (linear) outlier detection methods and multi-dimensional outlier detection methods (Shekhar, et al, 2003). The 1-D outlier detection algorithms consider the statistical distribution of non-spatial attribute values, ignoring the spatial relationships between items. The main idea is to fit the data set to a known standard distribution, and develop a test based on distribution properties (Barnett and Lewis, 1994; Johnson, 1992). Multi-dimensional outlier methods can be further grouped into two categories, namely homogeneous multidimensional metric based methods and spatial methods. The homogeneous multi-dimensional metric based methods do not distinguish between attribute dimensions and geo-spatial dimensions, and use all dimensions for defining neighborhood as well as for comparison. In the spatial methods, spatial attributes are used to characterize location, neighborhood, and distance, and non-spatial attribute dimensions are used to compare a spatially referenced object to its neighbors. Among others, Shekhar et al (2003) developed a unified modeling framework and identify efficient computational structure and strategies for detecting spatial outliers based on a single nonspatial attribute from a data set. Depth-based techniques are also applied extensively as clustering for spatial outlier detection, i.e. identifying the neighborhood of an object based on spatial relationship, and considering the proximity factor as the main basis for deciding if an object is an outlier with respect to neighboring objects or to a cluster. The limitation of these approaches is ignoring the influence of some of the underlying spatial objects that might be different at different spatial locations despite the close proximity, i.e., the semantic relationship is not considered in the clustering. An exception is that, Adam et al. (2004) identified spatial outliers by taking into account the spatial and semantic relationships among the objects. Ng (2001) used distance-based measures to detect unusual paths in two-dimensional space traced by individuals through a monitored environment. These measures allow the identification of unusual trajectories based on entry/exit points, speed and geometry; these trajectories may correspond to unwanted behaviors such as theft. Other methods used in data mining such as classification and aggregation, are also applied in spatial outlier detection (Miller, 2003).

In general, most existing methods only consider the nonspatial attributes of a data set, or only consider the spatial relations and ignore the semantic relations. Further, as all geographic phenomena evolve over time, temporal aspect should also be considered (Yao, 2003). How to detect the temporal outliers or spatial-temporal outliers has been seldom discussed. Moreover, spatial and temporal relationships exist among spatial entities at various levels (scales, Yao, 2003). Such relationships should be considered and reveled in spatiotemporal outlier detection. Our approach will build on existing approaches to evolve into a new methodology, which addresses the semantic aspects and dynamic aspects of spatiotemporal data in multi-scales. 2. ST OUTLIER DETECTION: PROBLEM DEFINITION AND PROPOSED ALGORITHMS Here we adopt the definition of spatial outlier to spatiotemporal domain and define a spatial-temporal outlier (STO) to be a spatial-temporal referenced object whose thematic attribute values are significantly different from those of other spatially and temporally referenced objects in its spatial or/and temporal neighborhood. In order to detect STOs from a data set, the existing methods for spatial outlier detection can be modified to address the semantic aspects and dynamic aspects of spatio-temporal data in multi-scales. Since clustering is a basic method for outlier detection, we start with it by including the semantic knowledge in the process. Then, the multi-scale property of natural phenomena is considered. If a spatial object (which is created from clustering) disappears after aggregation, it might be a potential STO. Since spatial objects are dynamic, the verification of STOs will consider the temporal continuity in addition to spatial continuity. Therefore a four-step approach is proposed to identify the spatio-temporal outliers (see Figure 1). Here we call it a multiscale approach since the aggregation and verification compare the change between two consecutive scales in space and time. 2.1 classification (clustering) that are missed (or filtered) in Step 2, which are potential STOs. In these step, the comparison is implemented at two consecutive spatial scales. 2.4 verification (checking temporal neighbours) This step checks the temporal neighbours of the potential STOs identified in the previous step. If the semantic value of such a STO does not have significant differences with its temporal neighbours, this is not a STO. Otherwise, it is confirmed as a STO. Sampling Data ST Objects ST Objects (reduced scale) Potential STOs Identified STOs Classification Aggregation Comparison Verification Figure 1. Four steps to detect the Spatio-temporal Outliers (STOs). 3. EXPERIMENTAL RESULTS This involves the classification or clustering of the input data based upon the background knowledge of the data. The clustering method is designed based upon the priorknowledge and characteristics of the data. If the data is rasterbased images, supervised classification might be applied or a classifier might be built based upon prior-knowledge (semantics-based approach). Other methods such as neural network can also be applied if no prior-knowledge about the data available. The purpose of this step is to form some regions that have significant semantic meanings. 2.2 aggregation (filtering) This step aggregates the clustered result in the previous step in order to check the stability of the data. In this step, outliers might be merged out. So it is also called filtering of noises. 2.3 comparison (identifing the merged objects) This step compares the clustered results derived from Step 1 with the results derived from Step 2 and identifies the objects 3.1 Data sets Ameland, a barrier island in the north of the Netherlands, was chosen as a case study area. The process of coast change involves the erosion and accumulation of sediments along the coast, which is scale-dependent in space and time. It can be monitored through the observation of annual changes of landscape units such as foreshore, beach and foredune. The landscape units are defined based upon water lines. The foreshore is the area above the closure depth and beneath the low water line, beach is the area above the low water line and beneath the dune foot, the foredune is the first row of the dunes inland from dune foot. Based on height observation, it is possible to derive a measure of foreshore, beach and duneness. Height observations have been made by laser scanning of the beach and dune area and by echo sounding on the foreshore. These data have been interpolated to form a full height raster of the test area. In the following analysis, the error of the height raster, which was used as the original fine resolution DEM, is ignored.

The data set we used covers part of the island. The DEMs of six consecutive years (from 1989-1995) is displayed in Figure 2. It is hard to identify the outliers in the images displayed in Figure 2. The purpose of our experiment is to use the multiscale approach to detect the outliers in these six year DEMs. 3.2 Implementation details We applied the four steps discussed in the previous section. First, we classified the DEMs into three landscape classes. The classification function was built based upon Dutch geomorphologists. For example, the area with height between -6 ~ -1.1 to be the foreshore; the area with height between - 1.1~2 to be the beach; and the area with height between 2~25 to be the foredune. The classification results are shown in Figure 3. Then, we changed the spatial scale of the DEMs by averaging the height value by a 3*3 window. We classified them again into three landscape classes, according to the class definition in the previous step. The aggregated results are shown in Figure 4. Later, we compared the images in Figure 3 and Figure 4 in the same year and found regions that were available in Figure 3 and disappeared in Figure 4. These regions are potential STOs (which are circled in Figure 5). For verification, we compared the height values of these potential STOs in the consecutive years. If the change of height is continuous then the potential STO is not a STO. For example, the STO appeared in 1991 (in upper-left corner) became part of a big dark area in 1991. It means the change is continuous and this is not an outlier in temporal perspective. Finally, we identified the STOs, which are circled in concreted line in Figure 6. For those circled in dashed lines in Figure 6, they are not STOs. The first author wishes to thank The Hong Kong Polytechnic University for the Postdoctoral Research Fellowship (no. G- YW92). REFERENCES Adam, N. R., Janeja, V. P. and Atluri, V., 2004, Neighbourhood based detection of anomalies in high dimension spatio-temporal sensor datasets, 2004 ACM symposium on Applied Computing, March 14 17, Nicosia, Cyprus, P. 576 583. Barnett, V. and Lewis, T., 1994, Outliers in Statistical Data, John Wiley. Miller, H., 2003, Geographic data mining and knowledge discovery, in J. P. Wilson and A. S. Fotheringham (eds.) Handbook of Geographic Information Science, in press. Ng, R., 2001, Detecting outliers from large datasets, in H. J. Miller and J. Han (eds.) Geographic Data Mining and Knowledge Discovery, London: Taylor and Francis, 218-235. Johnson, R., 1992, Applied Multilevel Statistical Analysis, Prentice Hall. Shekhar, S., Lu, C. T. and Zhang, P., 2003, A unified approach to detection spatial outliers, GeoInformatica, 7, 139-166. Yao, X., 2003, Research issues in spatio-temporal data mining, a white paper submitted to the University Consortium for Geographic Information Science (UCGIS) workshop on Geospatial Visualization and Knowledge Discovery, Lansdowne, Virginia, Nov. 18-20. 4. CONCLUSIONS AND FUTURE WORK In this paper we discussed spatial-temporal outlier detection. We defined a spatial-temporal outlier (STO) to be a spatialtemporal referenced object whose thematic attribute values are significantly different from those of other spatially and temporally referenced objects in its spatial or/and temporal neighborhood. We propose a multiscale approach to detect the STOs by evaluating the change between consecutive spatial and temporal scales. As for further research, the effect of granulites of spatial and temporal scales should be investigated. Further, quantitative calibration of the difference between two consecutive spatial and temporal scales should also be established. ACKNOWLEDGEMENTS

Figure 2: DEMs of Ameland in six consecutive years. Figure 3: Step 1 Clustering of the Data. Figure 4: Step 2 Aggregation: change in spatial scale.

Figure 5: Step 3 Comparison and potential STOs. Figure 6. Step 4 -- Verified STOs.