Uncertainty in Spatial Data Mining and Its Propagation

Similar documents
Quality Assessment and Uncertainty Handling in Uncertainty-Based Spatial Data Mining Framework

RESEARCH ON SPATIAL DATA MINING BASED ON UNCERTAINTY IN GOVERNMENT GIS

UNCERTAINTY EVALUATION OF MILITARY TERRAIN ANALYSIS RESULTS BY SIMULATION AND VISUALIZATION

A CARTOGRAPHIC DATA MODEL FOR BETTER GEOGRAPHICAL VISUALIZATION BASED ON KNOWLEDGE

Cell-based Model For GIS Generalization

A MULTISCALE APPROACH TO DETECT SPATIAL-TEMPORAL OUTLIERS

Digital Chart Cartography: Error and Quality control

32 1 Vol. 32, No ACTA GEODAETICA et CARTO GRAPHICA SIN ICA Feb.,2003. The Model of Error Entropy of Area Unit in GIS

Surveying, Mapping and Remote Sensing (LIESMARS), Wuhan University, China

STUDY ON SCALE TRANSFORMATION IN SPATIAL DATA MINING

THE QUALITY CONTROL OF VECTOR MAP DATA

A bottom-up strategy for uncertainty quantification in complex geo-computational models

Abstract 1. The challenges facing Cartography in the new era.

Quality Assessment of Geospatial Data

Bagging During Markov Chain Monte Carlo for Smoother Predictions

Data Mining in Chemometrics Sub-structures Learning Via Peak Combinations Searching in Mass Spectra

Visualizing Spatial Uncertainty of Multinomial Classes in Area-class Mapping

EFFICIENT MODELLING OF HEIGHT DATUM BASED ON GIS

Uncertainty of Spatial Metric Relations in GIs

Exploring Spatial Relationships for Knowledge Discovery in Spatial Data

Event-based Spatio-temporal Database Design

MECHANISM AND METHODS OF FUZZY GEOGRAPHICAL OBJECT MODELING

A STUDY ON INCREMENTAL OBJECT-ORIENTED MODEL AND ITS SUPPORTING ENVIRONMENT FOR CARTOGRAPHIC GENERALIZATION IN MULTI-SCALE SPATIAL DATABASE

Overview on the Using Rough Set Theory on GIS Spatial Relationships Constraint

A General Framework for Conflation

Statistical Perspectives on Geographic Information Science. Michael F. Goodchild University of California Santa Barbara

PRELIMINARY STUDIES ON CONTOUR TREE-BASED TOPOGRAPHIC DATA MINING

Joint International Mechanical, Electronic and Information Technology Conference (JIMET 2015)

8/28/2011. Contents. Lecture 1: Introduction to GIS. Dr. Bo Wu Learning Outcomes. Map A Geographic Language.

Research on Object-Oriented Geographical Data Model in GIS

Spatial Clustering and Outlier Analysis for the Regionalization of Maize Cultivation in China

Lecture 1: Geospatial Data Models

Treatment of Error in Experimental Measurements

43400 Serdang Selangor, Malaysia Serdang Selangor, Malaysia 4

An Improved Quantum Evolutionary Algorithm with 2-Crossovers

Spatial Webs. Michael F. Goodchild University of California Santa Barbara

SPATIAL INFORMATION GRID AND ITS APPLICATION IN GEOLOGICAL SURVEY

GOVERNMENT GIS BUILDING BASED ON THE THEORY OF INFORMATION ARCHITECTURE

Application of matter element analysis in weather forecasting

Error Propagation and Model Uncertainties of Cellular Automata in Urban Simulation with GIS

Modeling Temporal Uncertainty of Events: a. Descriptive Perspective

Cadcorp Introductory Paper I

3D Map Symbol and It s Modeling ABSTRACT

UPDATING AND REFINEMENT OF NATIONAL 1: DEM. National Geomatics Center of China, Beijing

3. DIFFERENT MODEL TYPES

Clustering Analysis of London Police Foot Patrol Behaviour from Raw Trajectories

Classification of High Spatial Resolution Remote Sensing Images Based on Decision Fusion

OPTIMIZED RESOURCE IN SATELLITE NETWORK BASED ON GENETIC ALGORITHM. Received June 2011; revised December 2011

UNIVERSAL ERROR PROPAGATION LAW

Different Models of the Curriculum for the Higher Education of Surveying & Mapping in China

A Method to Improve the Accuracy of Remote Sensing Data Classification by Exploiting the Multi-Scale Properties in the Scene

Automatic Differentiation Equipped Variable Elimination for Sensitivity Analysis on Probabilistic Inference Queries

MODELING DEM UNCERTAINTY IN GEOMORPHOMETRIC APPLICATIONS WITH MONTE CARLO-SIMULATION

Effect of Data Processing on Data Quality

Types of spatial data. The Nature of Geographic Data. Types of spatial data. Spatial Autocorrelation. Continuous spatial data: geostatistics

A methodology on natural occurring lines segmentation and generalization

SPATIAL DATA MINING. Ms. S. Malathi, Lecturer in Computer Applications, KGiSL - IIM

Learning Static Parameters in Stochastic Processes

DIGITAL CHART CARTOGRAPHY: ERROR AND QUALITY CONTROL

POPULAR CARTOGRAPHIC AREAL INTERPOLATION METHODS VIEWED FROM A GEOSTATISTICAL PERSPECTIVE

A new Approach to Drawing Conclusions from Data A Rough Set Perspective

DATA SOURCES AND INPUT IN GIS. By Prof. A. Balasubramanian Centre for Advanced Studies in Earth Science, University of Mysore, Mysore

GEO-INFORMATION (LAKE DATA) SERVICE BASED ON ONTOLOGY

Outlier Detection and Correction for the Deviations of Tooth Profiles of Gears

Syntactic Patterns of Spatial Relations in Text

Areal Differentiation Using Fuzzy Set Membership and Support Logic

ARPN Journal of Science and Technology All rights reserved.

D-S Evidence Theory Applied to Fault 1 Diagnosis of Generator Based on Embedded Sensors

The Development of Research on Automated Geographical Informational Generalization in China

Abstract. Three Methods and Their Limitations. N-1 Experiments Suffice to Determine the Causal Relations Among N Variables

Artificial Neural Networks D B M G. Data Base and Data Mining Group of Politecnico di Torino. Elena Baralis. Politecnico di Torino

UNCERTAINTY IN THE POPULATION GEOGRAPHIC INFORMATION SYSTEM

Uncertainty modeling for robust verifiable design. Arnold Neumaier University of Vienna Vienna, Austria

An Approach to Classification Based on Fuzzy Association Rules

LABELING RESIDENTIAL COMMUNITY CHARACTERISTICS FROM COLLECTIVE ACTIVITY PATTERNS USING TAXI TRIP DATA

Construction of Emergency Services Platform for Coal Mining Accidents by Integrating Multi Source Datasets

AN OBJECT-ORIENTED DATA MODEL FOR DIGITAL CARTOGRAPHIC OBJECT

A Generalized Decision Logic in Interval-set-valued Information Tables

Twenty Years of Progress: GIScience in Michael F. Goodchild University of California Santa Barbara

Multi-period medical diagnosis method using a single valued. neutrosophic similarity measure based on tangent function

The Seismic-Geological Comprehensive Prediction Method of the Low Permeability Calcareous Sandstone Reservoir

Fuzzy Limits of Functions

Land Use/Cover Change Detection Using Feature Database Based on Vector-Image Data Conflation

Naive Bayesian classifiers for multinomial features: a theoretical analysis

Liquid-in-glass thermometer

ESTIMATION OF GEOGRAPHICAL DATABASES CAPTURE SCALE BASED ON INTER-VERTICES DISTANCES EXPLORATION

Development of a Data Mining Methodology using Robust Design

MODELLING AND UNDERSTANDING MULTI-TEMPORAL LAND USE CHANGES

Uncertain Logic with Multiple Predicates

THE DESIGN AND IMPLEMENTATION OF A WEB SERVICES-BASED APPLICATION FRAMEWORK FOR SEA SURFACE TEMPERATURE INFORMATION

Article Scale Effect and Anisotropy Analyzed for Neutrosophic Numbers of Rock Joint Roughness Coefficient Based on Neutrosophic Statistics

Representing and Querying Correlated Tuples in Probabilistic Databases

Introduction to Machine Learning

Uncertain Second-order Logic

Taxonomies of Building Objects towards Topographic and Thematic Geo-Ontologies

Distributed Clustering and Local Regression for Knowledge Discovery in Multiple Spatial Databases

Urban Geo-Informatics John W Z Shi

Balancing and Control of a Freely-Swinging Pendulum Using a Model-Free Reinforcement Learning Algorithm

Stable Implicit Scheme for TM Transient Scattering from 2D Conducting Objects Using TD-EFIE

Runge-Kutta Method for Solving Uncertain Differential Equations

Transcription:

Uncertainty in Mining and Its Propagation Abstract Binbin He 1,2, Tao Fang 2, Dazhi Guo 1 and Jiang Yun 1 1 School of Environment & Informatics, China University of Mining and Technology, Xuzhou, JiangSu, China, 221008 binb_he @163.com guodazhi @ pub.xz.jsinfo.net 2 Institute of Image Processing & Pattern Recognition, Shanghai Jiao-Tong University, No.1954, Huashan Road, Shanghai, China, 200030 tfang @ sjtu.edu.cn mining is to extract the hidden, implicit, valid, novel and interesting spatial or non-spatial patterns or rules from large-amount, incomplete, noisy, fuzzy, random, and practical spatial bases. In which an important issue but remains underdeveloped is to reveal and handle the uncertainties of patterns and features in the course of spatial mining and knowledge discovery. In this paper, the types and origins of uncertainty, their models of measurement and propagation are analyzed firstly. Then, some uncertainty factors in operation of spatial mining are discussed. We think the process of spatial mining can be regarded as a complex system, a linear serial processing system in engineering control systems. An uncertainty propagation model of spatial mining fuzzy logic uncertainty propagation model with Creditability factor is developed. Moreover, several key problems about uncertainty handling and propagation in spatial mining are put forward. Key Words Uncertainty propagation mining Complex system Creditability Factor (CF) Fuzzy logic 1. Introduction mining is to extract the hidden, implicit, valid, novel and interesting spatial or non-spatial patterns or rules from large-amount, incomplete, noisy, fuzzy, random, and practical spatial bases Di, 2000.With the uninterrupted development in automatic obtaining techniques and methods of spatial, the amount of in spatial bases has been exponentially increased. But the deficiency in analytical functions of geographic information system (GIS) makes a vast amount of and useful knowledge to be stored lose their role, in other words, it means the spatial explode but knowledge is poor (Li, 2002). In recent years, the study on spatial mining is becoming an important field in GIS and computer technology, and a great number of research results have been obtained. Up to now, the investigation on spatial mining focuses in mining principles and methods themselves, such as mining algorithm of magnanimous, the mining of multi-source remote sensing and the mining of distributed spatial. However, another important issue, the uncertainty in spatial mining and its propagation have not been paid much attention to. In fact, spatial themselves contain several forms of uncertainties, and the many kinds of new uncertainties in the process of spatial mining will be evolved. Meanwhile, these uncertainties will be propagated and accumulated 1

in spatial mining process, and then induce some uncertain knowledge. These circumstances and characteristics had not been considered and the knowledge discovered was regarded as fully useful and certain knowledge in traditional spatial mining. Therefore, the research on characteristics and patterns of uncertainty and its propagation in spatial mining is very important. In this paper, the types, origins, uncertainty measurement and propagation of spatial are introduced in section 2. In section 3, some uncertainty factors in operation of spatial mining are discussed. Especially, we proposed that the process of spatial mining could be regarded as a complex system, a linear serial processing system in engineering control systems in section 4. An uncertainty propagation model of spatial mining fuzzy logic uncertainty propagation model with Creditability factor is developed. Moreover, several key problems about uncertainty handling and propagation in spatial mining are put forward in section 5. 2. A Brief Review of Uncertainties in 2.1 The types and origins of uncertainty in spatial It is said that the uncertainty within spatial is the major components and forms for the evaluation of spatial quality. quality includes lineage, accuracy, completeness, logical consistency, semantic accuracy and currency (FGDC, 1998). All types of spatial are subjected to uncertainty, since it is possible to create a perfect representation of the infinitely complex real world (Goodchild, 2003). Error refers to the discrepancy between observation results and true value, which has statistic characteristics. Uncertainty is more broadly-defined error concept continuation, measuring the discrepancy degree of the surveying objects knowledge. Uncertainties in spatial can be classified: error, vagueness, ambiguity and discord (Fisher, 2003). The techniques and process of spatial capture can deal with surveying measurement, interpretation, input, processing and representation. The uncertainties of spatial stem from two parts. On the one hand, they stem from instability of natural phenomena and incompleteness of men s cognition. On the other hand, the process of spatial capture and handling bring a lot of uncertainty. In addition, these uncertainties can be propagated from the former phase into the latter one, and accumulated in different laws. 2.2 Uncertainty measurement and propagation of spatial At present, a great deal of research have been developed in some areas, such as positional uncertainty and its propagation, especially the uncertainty of points, lines, polygons or areas. Therein, the uncertainty of points and lines is the basis of polygons and areas. Some uncertainty models have been constructed, including standard ellipse model (Mikhail and Ackerman, 1976) and circle normal model (Goodchild, 1991) of point position, Epsilon-band model (Chrisman, 1982) and error band model (Dutton, 1992) of line position. For last years, the study of spatial quality control put emphasis on the spatial positional uncertainty, but little on attribute uncertainty. In recent years, some scholars studied the attribute uncertainty of GIS (Liu, 1999; Ehlschlager, 2000; Shi, 2002). Usually, positional uncertainty and attribute uncertainty were studied respectively. Shi (2000) constructed S-band model that combine positional uncertainty with attribute uncertainty. Zhang (1999) constructed field model that position uncertainty and attribute uncertainty are described in uniform. The spatial uncertainty propagation problem can be formulated mathematically as follows: 2

Y ( ) = Opt( D1 ( ), L, Dm( )) 1 Let Y ( ) be the output of GIS operation Opt( ) on the m spatial sets. The operation Opt( ) may be one of the various operations in GIS such as intersection of sets. The principle of the spatial uncertainty propagation analysis is to determine the spatial uncertainty in the output Y ( ) given the operation Opt( ) and spatial uncertainties in the sets. The spatial uncertainty propagation is relatively easy when the operation Opt( ) is a linear function, which can be performed by error propagation law. However, few of operation Opt( ) were linear or could be solved by simple calculation. However, in general, rigorous methods and functions will be very troublesome. The Monte Carlo method (Openshaw, 1989) uses an entirely different approach to determine the uncertainty of geospatial objects. In this method, the results of Equation (1) are computed repeatedly, with input value D = [ D, 2, L, Dm ] that are randomly sampled from their 1 D joint distribution. The outputs of the equation construct random samples, in which their distribution parameters, such as mean value and variance, can be estimated. The Monte Carlo method may be a general method for uncertainty handling, and can be applied to the computation processing of spatial or attribute. An outstanding advantage of Monte Carlo method is able to provide the entire distribution of output at an arbitrary level of accuracy. The other advantages of this method are easy implementation and more general application. However, this method is more intensive computationally. 3 Uncertainties in Mining The uncertainties in spatial mining may occur in the process of spatial capture, spatial processing, mining and knowledge representing. The study achievements on the uncertainties of spatial are not only very important, but also starting point for further research in spatial mining. The firsthand for spatial mining usually stem from uncertain spatial base or uncertain spatial sets. Moreover, uncertainties in spatial may directly or indirectly affect the quality of spatial mining (Han and Kamber, 2001). The process of spatial mining may be divided into four phases: selection, preprocessing, mining, pattern evaluation and knowledge representation. In addition, a lot of uncertainties of spatial will happen in spatial mining, and these uncertainties will be propagated and accumulated in spatial mining process (Figure 1). Each phase of uncertainties will be analyzed as follows: 3

Uncertain Selection Object Preprocessing Mining Pattern Evaluation and Representation Uncertain Knowledge Uncertainty propagation Figure 1. Uncertainties and its propagation in the process of spatial mining At the phase of selection, the uncertainties mainly stem from selection of object according to the task of spatial mining through operator subjectivity, for example, what should be collected, and how much is enough. preprocessing mainly include cleaning, transformation and reduction. cleaning routines attempt to fill in missing values, smooth output noise while identifying outliers, and correct inconsistencies of. During the stage of transformation, the spatial are transformed or integrated into some appropriate forms for the implementations of mining. They may deal with some operations and processing, such as the smoothing, aggregation, generalization, normalization, attribute construction and discretization, and concept hierarchy generation. In general, discretization and concept hierarchy generation are the main origins of uncertainties in this phase, and many uncertainties may be hidden by discretization. At this phase, it is said that a lot of uncertainties could be eliminated by uncertainty handling methods. However, some uncertainties can never be eliminated completely, even some new uncertainties would occur in handling process, perhaps caused by use of some unsuitable models or processing methods. The limitation of mathematical model or mining algorithm may further be evolved and developed, even enlarge spatial uncertainty during the mining process. Moreover, there exist many uncertainties for spatial knowledge representation, including randomness, fuzziness and incompleteness of knowledge. It is well known that a sort of knowledge information may be represented in different ways. Most of spatial knowledge discovered by spatial mining is qualitative knowledge, and the best way to represent them is to use the natural languages. 4

4 An Uncertainty Propagation Model of Mining The analysis and handling of uncertainties may be carried out by means of different technologies. Unfortunately, the problem of uncertainty propagation law in spatial mining has not been solved. We think that the process of spatial mining can be regarded as a complex system, a linear serial processing system in engineering control systems, in which have several inputs and outputs, and they coupled and associated in some means. Thus the implicit or explicit propagation function of every process may be given. In this case, the complicated problem could be simplified and solved step by step. On the basis of this principle, an uncertainty propagation model in spatial mining, fuzzy logic uncertainty propagation model with Creditability factor, is constructed. In order to describe conveniently, we only consider the propagation among several key links: uncertain spatial spatial operation discritization and concept hierarchy generation mining knowledge representation uncertain reasoning. Randomness (error) and fuzziness are the two main uncertainties of spatial and spatial mining. However, the current method is limited to handle one kind of uncertainties. In fact, spatial and the process of spatial mining exist in both randomness and fuzziness. The fuzzy logic uncertainty propagation model with creditability factor can represent and handle two uncertainties simultaneously. In this model, uncertainty arisen from randomness may be represent by Creditability Factor (CF), and uncertainty arisen from fuzziness may be represent by fuzzy set. The figure.2 shows the pattern of uncertainty propagation in spatial mining. The entire model of uncertainty propagation may be recognized as a reasoning network. The main purpose of Creditability Factor (CF) is for describing propagation of uncertainty by reasoning network. The reasoning network is relatively complicated. The Creditability Factors (CF) may be D1(CFD1) D2(CFD2) CF11 CF12 S O Object OD1 Object OD2 1 2 D C GD1 GD2 CF31 CF32 D M Rule1 CF1 Rule1 CF41 CF42 U R Knowledge CF Dn(CFDn) CF1m Object ODm L GDL CF3K Rule1 CFK CF4K SO Operation DC Discritization and Concept hierarchy generation DM Mining UR Uncertain Resoning Figure 2. Uncertainty propagation model in spatial mining process combined in parallel / sequential ways. The spatial with Creditability Factor (CF) is recognized as initial evidence, which can be given uncertainty measurement model according to different types of spatial. Creditability Factor (CF) is represented by fuzzy language value. 5

Let µ ( ) and µ ( ) are the subjective functions of fuzzy language of CF 1 and CF 2, then CF1 x CF 2 y the operations between them are as follows: CF + CF µ ( z) = ( µ ( x) ( )) 1 2 : CF 1+ CF1 µ y CF1 : µ CF 1 ( z) = ( µ CF1 ( x) µ ( y)) x+ y x y CF1 : µ CF 1 ( z) = ( µ CF1 ( x) µ ( y)) CF 1 : µ CF 1 ( z) = ( µ CF1 ( x) µ ( y)) x y x y In which, denotes the maximum value and denotes the minimum value. Furthermore, several algorithms (such as evidence combination, uncertainty propagation and result combination) that their models involved are referenced to the relative algorithms of the Creditability method in Artificial Intelligence (AI). The model of Creditability Factor (CF) should consisted in the processing model of this phase in theory, such as the Creditability Factor (CF) model with probability interpretation, although which is a difficult operation in this model. 5 Conclusion & Discussion We expect to achieve two objectives by studying the uncertainties of spatial and spatial mining. Firstly, the quality of spatial mining can be improved by analyzing the uncertainties and its characteristics in each phase of spatial mining, and finding the proper approaches to reduce the uncertainties. Secondly, although uncertainties of spatial mining could not be fully eliminated, we may construct the measurement model of uncertainty propagation by analyzing the propagation law of uncertainties in spatial mining in order to evaluate them approximately and to make use of the knowledge discovered in spatial mining. The origin, measurement and propagation of spatial mining are analyzed in this paper. Several theories and methods are applied as follows: (1) The comprehensive measurement model of spatial, the measurement model of uncertainty currently is based on the random error of spatial. In fact, the spatial only involving single uncertainty is not existed. Constructing a comprehensive measurement model of spatial is the basis of the research; (2) The transformation model of uncertainty in discritization and concept hierarchy generation are more difficult operations, but very important. For many uncertainties may be hidden by discritization and concept hierarchy generation. The quantitative and qualitative concept can be transformed each other, and the uncertainties of map relations can be reflected in the models. (3) The usual uncertainty-based mining algorithm and the published techniques for spatial mining can not very well handle the spatial information in spatial, and they go short of evaluation to discovered knowledge; (4) The representation methods of uncertain spatial knowledge and traditional knowledge representation ways (such as Predicate logic, Producing type rule) are not suitable for representing the uncertainty of spatial knowledge. Acknowledgements The work described in this paper was supported by the funds from National Natural Science Foundation of China (Project No.60275021). 6

References Chrisman N. R., 1982, A theory of cartography error and its measurement in digital base In Proceedings of Auto-Carto5 Di, K.C., 2000, Mining and Knowledge Discovery (Wuhan, China Wuhan University Press) Dutton G., 1992, Handling positional uncertainty in spatial base In Proceedings of 5 th International Symposium on Mining, 460~469 Ehlschlager C.R., 2000, Representing uncertainty of area class maps with a correlated inter-map cell swapping heuristic Computers, Enviroment and Urban Systems, 24, 451~469 FGDC(Federal Geographic Committee),1998, Content standard for digital geospatial meta FGDC-STD-001-1998, National Technical Information Services, Computer Products Office, Springfield, Virginia, USA Fisher P.F., 2003, quality and uncertainty ships passing in the light In Proceedings of The 2 nd International Symposium on Quality 2003 Edited by Shi W.Z., Goodchild M.F., Fisher P.F Hong Kong The Hong Kong Polytechnic University Goodchild M. F., 1991, Issuees of quality and uncertainty In Advances In Cartography by Muller J.C Goodchild M.F, 2003, Models for uncertainty in area-class maps In Proceedings of The 2 nd International Symposium on Quality 2003 Edited by Shi W.Z., Goodchild M.F., Fisher P.F Hong Kong The Hong Kong Polytechnic University Han J.W., Kamber M., 2001, Mining: Concepts and Techniques San Francisco California. Morgan Kaufmann Publishers Li D.R., Wang S.L., Li D.Y. and Wang X.Z., 2002, Theories and technologies of spatial knowledge discovery Geomatics and Information Science of Wuhan University, 27 3 221~233 Liu W.B., Den M. and Xia Z.G., 1999, Analysis of uncertainties of attributes in vector GIS Acta Geodaetica et Cartographica Sinica, 29 1 76~81 Mikhail E. M. and Ackerman F., 1976, Observations and Least Squares New York IEP-A Dun-Donnelley Publisher Openshaw, S., 1989, Learning to live with errors in spatial base, In Accuracy of base, Taylor & Francis Ltd, New York, 263-276 Shi W.Z., Wang S.L., 2002, Further development of theories and methods on attribute uncertainty in GIS Journal of Remote Sensing, 6(5) 393~400 Shi W.Z., 2000, Theory and Methods for Handling Errors in (Beijing: Science Press) Zhang J.X., Du D.S., 1999, Field based models for positional and attribute uncertainty Acta Geodaetica et Cartographica Sinica, 28(3), 244~249 7