Uncertainty in Mining and Its Propagation Abstract Binbin He 1,2, Tao Fang 2, Dazhi Guo 1 and Jiang Yun 1 1 School of Environment & Informatics, China University of Mining and Technology, Xuzhou, JiangSu, China, 221008 binb_he @163.com guodazhi @ pub.xz.jsinfo.net 2 Institute of Image Processing & Pattern Recognition, Shanghai Jiao-Tong University, No.1954, Huashan Road, Shanghai, China, 200030 tfang @ sjtu.edu.cn mining is to extract the hidden, implicit, valid, novel and interesting spatial or non-spatial patterns or rules from large-amount, incomplete, noisy, fuzzy, random, and practical spatial bases. In which an important issue but remains underdeveloped is to reveal and handle the uncertainties of patterns and features in the course of spatial mining and knowledge discovery. In this paper, the types and origins of uncertainty, their models of measurement and propagation are analyzed firstly. Then, some uncertainty factors in operation of spatial mining are discussed. We think the process of spatial mining can be regarded as a complex system, a linear serial processing system in engineering control systems. An uncertainty propagation model of spatial mining fuzzy logic uncertainty propagation model with Creditability factor is developed. Moreover, several key problems about uncertainty handling and propagation in spatial mining are put forward. Key Words Uncertainty propagation mining Complex system Creditability Factor (CF) Fuzzy logic 1. Introduction mining is to extract the hidden, implicit, valid, novel and interesting spatial or non-spatial patterns or rules from large-amount, incomplete, noisy, fuzzy, random, and practical spatial bases Di, 2000.With the uninterrupted development in automatic obtaining techniques and methods of spatial, the amount of in spatial bases has been exponentially increased. But the deficiency in analytical functions of geographic information system (GIS) makes a vast amount of and useful knowledge to be stored lose their role, in other words, it means the spatial explode but knowledge is poor (Li, 2002). In recent years, the study on spatial mining is becoming an important field in GIS and computer technology, and a great number of research results have been obtained. Up to now, the investigation on spatial mining focuses in mining principles and methods themselves, such as mining algorithm of magnanimous, the mining of multi-source remote sensing and the mining of distributed spatial. However, another important issue, the uncertainty in spatial mining and its propagation have not been paid much attention to. In fact, spatial themselves contain several forms of uncertainties, and the many kinds of new uncertainties in the process of spatial mining will be evolved. Meanwhile, these uncertainties will be propagated and accumulated 1
in spatial mining process, and then induce some uncertain knowledge. These circumstances and characteristics had not been considered and the knowledge discovered was regarded as fully useful and certain knowledge in traditional spatial mining. Therefore, the research on characteristics and patterns of uncertainty and its propagation in spatial mining is very important. In this paper, the types, origins, uncertainty measurement and propagation of spatial are introduced in section 2. In section 3, some uncertainty factors in operation of spatial mining are discussed. Especially, we proposed that the process of spatial mining could be regarded as a complex system, a linear serial processing system in engineering control systems in section 4. An uncertainty propagation model of spatial mining fuzzy logic uncertainty propagation model with Creditability factor is developed. Moreover, several key problems about uncertainty handling and propagation in spatial mining are put forward in section 5. 2. A Brief Review of Uncertainties in 2.1 The types and origins of uncertainty in spatial It is said that the uncertainty within spatial is the major components and forms for the evaluation of spatial quality. quality includes lineage, accuracy, completeness, logical consistency, semantic accuracy and currency (FGDC, 1998). All types of spatial are subjected to uncertainty, since it is possible to create a perfect representation of the infinitely complex real world (Goodchild, 2003). Error refers to the discrepancy between observation results and true value, which has statistic characteristics. Uncertainty is more broadly-defined error concept continuation, measuring the discrepancy degree of the surveying objects knowledge. Uncertainties in spatial can be classified: error, vagueness, ambiguity and discord (Fisher, 2003). The techniques and process of spatial capture can deal with surveying measurement, interpretation, input, processing and representation. The uncertainties of spatial stem from two parts. On the one hand, they stem from instability of natural phenomena and incompleteness of men s cognition. On the other hand, the process of spatial capture and handling bring a lot of uncertainty. In addition, these uncertainties can be propagated from the former phase into the latter one, and accumulated in different laws. 2.2 Uncertainty measurement and propagation of spatial At present, a great deal of research have been developed in some areas, such as positional uncertainty and its propagation, especially the uncertainty of points, lines, polygons or areas. Therein, the uncertainty of points and lines is the basis of polygons and areas. Some uncertainty models have been constructed, including standard ellipse model (Mikhail and Ackerman, 1976) and circle normal model (Goodchild, 1991) of point position, Epsilon-band model (Chrisman, 1982) and error band model (Dutton, 1992) of line position. For last years, the study of spatial quality control put emphasis on the spatial positional uncertainty, but little on attribute uncertainty. In recent years, some scholars studied the attribute uncertainty of GIS (Liu, 1999; Ehlschlager, 2000; Shi, 2002). Usually, positional uncertainty and attribute uncertainty were studied respectively. Shi (2000) constructed S-band model that combine positional uncertainty with attribute uncertainty. Zhang (1999) constructed field model that position uncertainty and attribute uncertainty are described in uniform. The spatial uncertainty propagation problem can be formulated mathematically as follows: 2
Y ( ) = Opt( D1 ( ), L, Dm( )) 1 Let Y ( ) be the output of GIS operation Opt( ) on the m spatial sets. The operation Opt( ) may be one of the various operations in GIS such as intersection of sets. The principle of the spatial uncertainty propagation analysis is to determine the spatial uncertainty in the output Y ( ) given the operation Opt( ) and spatial uncertainties in the sets. The spatial uncertainty propagation is relatively easy when the operation Opt( ) is a linear function, which can be performed by error propagation law. However, few of operation Opt( ) were linear or could be solved by simple calculation. However, in general, rigorous methods and functions will be very troublesome. The Monte Carlo method (Openshaw, 1989) uses an entirely different approach to determine the uncertainty of geospatial objects. In this method, the results of Equation (1) are computed repeatedly, with input value D = [ D, 2, L, Dm ] that are randomly sampled from their 1 D joint distribution. The outputs of the equation construct random samples, in which their distribution parameters, such as mean value and variance, can be estimated. The Monte Carlo method may be a general method for uncertainty handling, and can be applied to the computation processing of spatial or attribute. An outstanding advantage of Monte Carlo method is able to provide the entire distribution of output at an arbitrary level of accuracy. The other advantages of this method are easy implementation and more general application. However, this method is more intensive computationally. 3 Uncertainties in Mining The uncertainties in spatial mining may occur in the process of spatial capture, spatial processing, mining and knowledge representing. The study achievements on the uncertainties of spatial are not only very important, but also starting point for further research in spatial mining. The firsthand for spatial mining usually stem from uncertain spatial base or uncertain spatial sets. Moreover, uncertainties in spatial may directly or indirectly affect the quality of spatial mining (Han and Kamber, 2001). The process of spatial mining may be divided into four phases: selection, preprocessing, mining, pattern evaluation and knowledge representation. In addition, a lot of uncertainties of spatial will happen in spatial mining, and these uncertainties will be propagated and accumulated in spatial mining process (Figure 1). Each phase of uncertainties will be analyzed as follows: 3
Uncertain Selection Object Preprocessing Mining Pattern Evaluation and Representation Uncertain Knowledge Uncertainty propagation Figure 1. Uncertainties and its propagation in the process of spatial mining At the phase of selection, the uncertainties mainly stem from selection of object according to the task of spatial mining through operator subjectivity, for example, what should be collected, and how much is enough. preprocessing mainly include cleaning, transformation and reduction. cleaning routines attempt to fill in missing values, smooth output noise while identifying outliers, and correct inconsistencies of. During the stage of transformation, the spatial are transformed or integrated into some appropriate forms for the implementations of mining. They may deal with some operations and processing, such as the smoothing, aggregation, generalization, normalization, attribute construction and discretization, and concept hierarchy generation. In general, discretization and concept hierarchy generation are the main origins of uncertainties in this phase, and many uncertainties may be hidden by discretization. At this phase, it is said that a lot of uncertainties could be eliminated by uncertainty handling methods. However, some uncertainties can never be eliminated completely, even some new uncertainties would occur in handling process, perhaps caused by use of some unsuitable models or processing methods. The limitation of mathematical model or mining algorithm may further be evolved and developed, even enlarge spatial uncertainty during the mining process. Moreover, there exist many uncertainties for spatial knowledge representation, including randomness, fuzziness and incompleteness of knowledge. It is well known that a sort of knowledge information may be represented in different ways. Most of spatial knowledge discovered by spatial mining is qualitative knowledge, and the best way to represent them is to use the natural languages. 4
4 An Uncertainty Propagation Model of Mining The analysis and handling of uncertainties may be carried out by means of different technologies. Unfortunately, the problem of uncertainty propagation law in spatial mining has not been solved. We think that the process of spatial mining can be regarded as a complex system, a linear serial processing system in engineering control systems, in which have several inputs and outputs, and they coupled and associated in some means. Thus the implicit or explicit propagation function of every process may be given. In this case, the complicated problem could be simplified and solved step by step. On the basis of this principle, an uncertainty propagation model in spatial mining, fuzzy logic uncertainty propagation model with Creditability factor, is constructed. In order to describe conveniently, we only consider the propagation among several key links: uncertain spatial spatial operation discritization and concept hierarchy generation mining knowledge representation uncertain reasoning. Randomness (error) and fuzziness are the two main uncertainties of spatial and spatial mining. However, the current method is limited to handle one kind of uncertainties. In fact, spatial and the process of spatial mining exist in both randomness and fuzziness. The fuzzy logic uncertainty propagation model with creditability factor can represent and handle two uncertainties simultaneously. In this model, uncertainty arisen from randomness may be represent by Creditability Factor (CF), and uncertainty arisen from fuzziness may be represent by fuzzy set. The figure.2 shows the pattern of uncertainty propagation in spatial mining. The entire model of uncertainty propagation may be recognized as a reasoning network. The main purpose of Creditability Factor (CF) is for describing propagation of uncertainty by reasoning network. The reasoning network is relatively complicated. The Creditability Factors (CF) may be D1(CFD1) D2(CFD2) CF11 CF12 S O Object OD1 Object OD2 1 2 D C GD1 GD2 CF31 CF32 D M Rule1 CF1 Rule1 CF41 CF42 U R Knowledge CF Dn(CFDn) CF1m Object ODm L GDL CF3K Rule1 CFK CF4K SO Operation DC Discritization and Concept hierarchy generation DM Mining UR Uncertain Resoning Figure 2. Uncertainty propagation model in spatial mining process combined in parallel / sequential ways. The spatial with Creditability Factor (CF) is recognized as initial evidence, which can be given uncertainty measurement model according to different types of spatial. Creditability Factor (CF) is represented by fuzzy language value. 5
Let µ ( ) and µ ( ) are the subjective functions of fuzzy language of CF 1 and CF 2, then CF1 x CF 2 y the operations between them are as follows: CF + CF µ ( z) = ( µ ( x) ( )) 1 2 : CF 1+ CF1 µ y CF1 : µ CF 1 ( z) = ( µ CF1 ( x) µ ( y)) x+ y x y CF1 : µ CF 1 ( z) = ( µ CF1 ( x) µ ( y)) CF 1 : µ CF 1 ( z) = ( µ CF1 ( x) µ ( y)) x y x y In which, denotes the maximum value and denotes the minimum value. Furthermore, several algorithms (such as evidence combination, uncertainty propagation and result combination) that their models involved are referenced to the relative algorithms of the Creditability method in Artificial Intelligence (AI). The model of Creditability Factor (CF) should consisted in the processing model of this phase in theory, such as the Creditability Factor (CF) model with probability interpretation, although which is a difficult operation in this model. 5 Conclusion & Discussion We expect to achieve two objectives by studying the uncertainties of spatial and spatial mining. Firstly, the quality of spatial mining can be improved by analyzing the uncertainties and its characteristics in each phase of spatial mining, and finding the proper approaches to reduce the uncertainties. Secondly, although uncertainties of spatial mining could not be fully eliminated, we may construct the measurement model of uncertainty propagation by analyzing the propagation law of uncertainties in spatial mining in order to evaluate them approximately and to make use of the knowledge discovered in spatial mining. The origin, measurement and propagation of spatial mining are analyzed in this paper. Several theories and methods are applied as follows: (1) The comprehensive measurement model of spatial, the measurement model of uncertainty currently is based on the random error of spatial. In fact, the spatial only involving single uncertainty is not existed. Constructing a comprehensive measurement model of spatial is the basis of the research; (2) The transformation model of uncertainty in discritization and concept hierarchy generation are more difficult operations, but very important. For many uncertainties may be hidden by discritization and concept hierarchy generation. The quantitative and qualitative concept can be transformed each other, and the uncertainties of map relations can be reflected in the models. (3) The usual uncertainty-based mining algorithm and the published techniques for spatial mining can not very well handle the spatial information in spatial, and they go short of evaluation to discovered knowledge; (4) The representation methods of uncertain spatial knowledge and traditional knowledge representation ways (such as Predicate logic, Producing type rule) are not suitable for representing the uncertainty of spatial knowledge. Acknowledgements The work described in this paper was supported by the funds from National Natural Science Foundation of China (Project No.60275021). 6
References Chrisman N. R., 1982, A theory of cartography error and its measurement in digital base In Proceedings of Auto-Carto5 Di, K.C., 2000, Mining and Knowledge Discovery (Wuhan, China Wuhan University Press) Dutton G., 1992, Handling positional uncertainty in spatial base In Proceedings of 5 th International Symposium on Mining, 460~469 Ehlschlager C.R., 2000, Representing uncertainty of area class maps with a correlated inter-map cell swapping heuristic Computers, Enviroment and Urban Systems, 24, 451~469 FGDC(Federal Geographic Committee),1998, Content standard for digital geospatial meta FGDC-STD-001-1998, National Technical Information Services, Computer Products Office, Springfield, Virginia, USA Fisher P.F., 2003, quality and uncertainty ships passing in the light In Proceedings of The 2 nd International Symposium on Quality 2003 Edited by Shi W.Z., Goodchild M.F., Fisher P.F Hong Kong The Hong Kong Polytechnic University Goodchild M. F., 1991, Issuees of quality and uncertainty In Advances In Cartography by Muller J.C Goodchild M.F, 2003, Models for uncertainty in area-class maps In Proceedings of The 2 nd International Symposium on Quality 2003 Edited by Shi W.Z., Goodchild M.F., Fisher P.F Hong Kong The Hong Kong Polytechnic University Han J.W., Kamber M., 2001, Mining: Concepts and Techniques San Francisco California. Morgan Kaufmann Publishers Li D.R., Wang S.L., Li D.Y. and Wang X.Z., 2002, Theories and technologies of spatial knowledge discovery Geomatics and Information Science of Wuhan University, 27 3 221~233 Liu W.B., Den M. and Xia Z.G., 1999, Analysis of uncertainties of attributes in vector GIS Acta Geodaetica et Cartographica Sinica, 29 1 76~81 Mikhail E. M. and Ackerman F., 1976, Observations and Least Squares New York IEP-A Dun-Donnelley Publisher Openshaw, S., 1989, Learning to live with errors in spatial base, In Accuracy of base, Taylor & Francis Ltd, New York, 263-276 Shi W.Z., Wang S.L., 2002, Further development of theories and methods on attribute uncertainty in GIS Journal of Remote Sensing, 6(5) 393~400 Shi W.Z., 2000, Theory and Methods for Handling Errors in (Beijing: Science Press) Zhang J.X., Du D.S., 1999, Field based models for positional and attribute uncertainty Acta Geodaetica et Cartographica Sinica, 28(3), 244~249 7