Granular Computing: Granular Classifiers and Missing Values

Similar documents
On Granular Rough Computing: Factoring Classifiers through Granulated Decision Systems

2 Rough Sets In Data Analysis: Foundations and Applications

On Knowledge Granulation and Applications to Classifier Induction in the Framework of Rough Mereology

Interpreting Low and High Order Rules: A Granular Computing Approach

Ensembles of classifiers based on approximate reducts

Classification of Voice Signals through Mining Unique Episodes in Temporal Information Systems: A Rough Set Approach

Similarity-based Classification with Dominance-based Decision Rules

Banacha Warszawa Poland s:

Minimal Attribute Space Bias for Attribute Reduction

Feature Selection with Fuzzy Decision Reducts

A new Approach to Drawing Conclusions from Data A Rough Set Perspective

Andrzej Skowron, Zbigniew Suraj (Eds.) To the Memory of Professor Zdzisław Pawlak

Classification Based on Logical Concept Analysis

The size of decision table can be understood in terms of both cardinality of A, denoted by card (A), and the number of equivalence classes of IND (A),

An algorithm for induction of decision rules consistent with the dominance principle

Chapter 18 Rough Neurons: Petri Net Models and Applications

A Logical Formulation of the Granular Data Model

A Scientometrics Study of Rough Sets in Three Decades

Fuzzy Modal Like Approximation Operations Based on Residuated Lattices

1 Introduction Rough sets theory has been developed since Pawlak's seminal work [6] (see also [7]) as a tool enabling to classify objects which are on

Computational Intelligence, Volume, Number, VAGUENES AND UNCERTAINTY: A ROUGH SET PERSPECTIVE. Zdzislaw Pawlak

ROUGH set methodology has been witnessed great success

A version of rough mereology suitable for rough sets

High Frequency Rough Set Model based on Database Systems

Rough Sets and Conflict Analysis

Granularity, Multi-valued Logic, Bayes Theorem and Rough Sets

On Probability of Matching in Probability Based Rough Set Definitions

Rough Sets, Rough Relations and Rough Functions. Zdzislaw Pawlak. Warsaw University of Technology. ul. Nowowiejska 15/19, Warsaw, Poland.

Computers and Mathematics with Applications

Rough Set Model Selection for Practical Decision Making

Drawing Conclusions from Data The Rough Set Way

Index. C, system, 8 Cech distance, 549

On Improving the k-means Algorithm to Classify Unclassified Patterns

ENSEMBLES OF DECISION RULES

A Simple Implementation of the Stochastic Discrimination for Pattern Recognition

A Generalized Decision Logic in Interval-set-valued Information Tables

Easy Categorization of Attributes in Decision Tables Based on Basic Binary Discernibility Matrix

Rough sets: Some extensions

On the Structure of Rough Approximations

Three Discretization Methods for Rule Induction

IN the areas of machine learning, artificial intelligence, as

A PRIMER ON ROUGH SETS:

Sets with Partial Memberships A Rough Set View of Fuzzy Sets

Hierarchical Structures on Multigranulation Spaces

Rough Set Approaches for Discovery of Rules and Attribute Dependencies

Some remarks on conflict analysis

International Journal of Approximate Reasoning

Data Analysis - the Rough Sets Perspective

Mining Approximative Descriptions of Sets Using Rough Sets

Naive Bayesian Rough Sets

ARPN Journal of Science and Technology All rights reserved.

Comparison of Rough-set and Interval-set Models for Uncertain Reasoning

Research on Complete Algorithms for Minimal Attribute Reduction

THE LOCALIZATION OF MINDSTORMS NXT IN THE MAGNETIC UNSTABLE ENVIRONMENT BASED ON HISTOGRAM FILTERING

ROUGH SET THEORY FOR INTELLIGENT INDUSTRIAL APPLICATIONS

Concept Lattices in Rough Set Theory

Rough operations on Boolean algebras

Modeling the Real World for Data Mining: Granular Computing Approach

Dialectics of Approximation of Semantics of Rough Sets

FUZZY PARTITIONS II: BELIEF FUNCTIONS A Probabilistic View T. Y. Lin

PUBLICATIONS OF CECYLIA RAUSZER

On rule acquisition in incomplete multi-scale decision tables

The Fourth International Conference on Innovative Computing, Information and Control

Roman Słowiński. Rough or/and Fuzzy Handling of Uncertainty?

On Proofs and Rule of Multiplication in Fuzzy Attribute Logic

Discovery of Concurrent Data Models from Experimental Tables: A Rough Set Approach

Iterative Laplacian Score for Feature Selection

Mathematical Approach to Vagueness

2 WANG Jue, CUI Jia et al. Vol.16 no", the discernibility matrix is only a new kind of learning method. Otherwise, we have to provide the specificatio

ON SOME PROPERTIES OF ROUGH APPROXIMATIONS OF SUBRINGS VIA COSETS

Learning Sunspot Classification

Knowledge Discovery. Zbigniew W. Ras. Polish Academy of Sciences, Dept. of Comp. Science, Warsaw, Poland

ISSN Article. Discretization Based on Entropy and Multiple Scanning

Statistical Model for Rough Set Approach to Multicriteria Classification

Additive Preference Model with Piecewise Linear Components Resulting from Dominance-based Rough Set Approximations

Research Article Special Approach to Near Set Theory

Relationship between Loss Functions and Confirmation Measures

Notes on Rough Set Approximations and Associated Measures

arxiv: v1 [cs.lo] 16 Jul 2017

This article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal non-commercial research and

Foundations of Classification

NEAR GROUPS ON NEARNESS APPROXIMATION SPACES

On the Relation of Probability, Fuzziness, Rough and Evidence Theory

ENTROPIES OF FUZZY INDISCERNIBILITY RELATION AND ITS OPERATIONS

Compenzational Vagueness

Data Mining und Maschinelles Lernen

Tolerance Approximation Spaces. Andrzej Skowron. Institute of Mathematics. Warsaw University. Banacha 2, Warsaw, Poland

Effect of Rule Weights in Fuzzy Rule-Based Classification Systems

Semantic Rendering of Data Tables: Multivalued Information Systems Revisited

arxiv: v1 [cs.ai] 25 Sep 2012

Data mining using Rough Sets

Rough Sets for Uncertainty Reasoning

Action Rule Extraction From A Decision Table : ARED

CRITERIA REDUCTION OF SET-VALUED ORDERED DECISION SYSTEM BASED ON APPROXIMATION QUALITY

CMSC 422 Introduction to Machine Learning Lecture 4 Geometry and Nearest Neighbors. Furong Huang /

Rough Set Approach for Generation of Classification Rules for Jaundice

Handbook of Logic and Proof Techniques for Computer Science

Beyond Sequential Covering Boosted Decision Rules

Friedman s test with missing observations

A novel k-nn approach for data with uncertain attribute values

Transcription:

1 Granular Computing: Granular Classifiers and Missing Values Lech Polkowski 1,2 and Piotr Artiemjew 2 Polish-Japanese Institute of Information Technology 1 Koszykowa str. 86, 02008 Warsaw, Poland; Department of Mathematics and Computer Science University of Warmia and Mazury 2 Zolnierska 14, 10560 Olsztyn, Poland email:polkow@pjwstk.edu.pl; artem@matman.uwm.edu.pl Abstract Granular Computing is a paradigm destined to study how to compute with granules of knowledge that are collective objects formed from individual objects by means of a similarity measure. The idea of granulation was put forth by Lotfi Zadeh: granulation is inculcated in fuzzy set theory by the very definition of a fuzzy set and inverse values of fuzzy membership functions are elementary forms of granules. Similarly, rough sets admit granules defined naturally as classes of indiscernibility relations; the search for more flexible granules has led to granules based on blocks (Grzymala Busse), templates (H.S.Nguyen), rough inclusions (Polkowski, Skowron), and tolerance or similarity relations, and more generally, binary relations (T.Y. Lin, Y. Y. Yao). Rough inclusions establish a form of similarity relations that are reflexive but not necessarily symmetric; in applications presented in this work, we restrict ourselves to symmetric rough inclusions based on the set DIS(u, v) = {a A : a(u) a(v)} of attributes discerning between given objects u, v without any additional parameters. Our rough inclusions are induced in their basic forms in a unified framework of continuous t norms; in this work we apply the rough inclusion µ L induced from the Łukasiewicz t norm L(x, y) = max{0, x+y 1} by means of the formula g( DIS(u,v) ) = IND(u,v), where g is the function that occurs in the functional representation of L and IND(u, v) = U U \ DIS(u, v). Granules of knowledge induced by rough inclusions are formed as neighborhoods of given radii of objects by means of the class operator of mereology (see below). L.Polkowski in his feature talks at conferences 2005, 2006 IEEE GrC, put forth the hypothesis that similarity of objects in a granule should lead to closeness of sufficiently many attribute values on objects in the granule and thus averaging in a sense values of attributes on objects in a granule should lead to a new data set, the granular one, which should preserve information encoded in the original data set to a satisfactory degree. This hypothesis is borne out in this work with tests on real data sets. We also address the problem of missing values in data sets; this problem has been addressed within rough set theory by many authors, e.g., Grzymala Busse, Kryszkiewicz, Rybinski. We propose a novel approach to this problem: an object with missing values is absorbed in a granule and takes part in determining a granular object; then, at classification stage, objects with missing values are matched against closest granular objects. We present details of this approach along with tests on real data sets. This paper is a companion to [19] where theoretical principles of granule formation are emphasized. Index Terms Granulation of knowledge, Rough sets, Rough inclusions, Granular decision systems, Missing values. I. INTRODUCTION: ROUGH SETS Knowledge is represented in this work along lines of rough set theory [13], i.e., the basic object is an information system (U, A) where U is a set of objects and A is a set of attributes; each attribute a A is a mapping a : U V a from U into the value set of a, V a. Knowledge is encoded in this setting in the family of indiscernibility relations IND = {ind(a) : a A} where ind(a) = {(u, v) : a(u) = a(v)}. For any set B A, the B indiscernibility relation is ind(b) = a B ind(a); classes [u] B = {v U : (u, v) ind(b)} of B indiscernibility form B elementary granules of knowledge; their unions are B granules of knowledge. A decision system is a triple (U, A, d) where the decision d : U V d is not in A; reasoning about d is carried out by means of descriptors; a descriptor is a formula of the form (a = v) where v V a. From descriptors, formulas are formed by means of sentential connectives,,,. The meaning of a descriptor (a = v) is [a = v] = {u U : a(u) = v} and the meaning is extended by recursion: [α β]=[α] [β], [α β]=[α] [β], [ α]=u \ [α].

A decision rule is a descriptor formula of the form a B (a = v a) (d = v); it is true when [ a B (a = v a )] [d = v]; otherwise it is partially true, see, e.g., [14] for a deeper discussion. A set of decision rules is a decision algorithm; when applied to classification of new objects, it is called also a classifier. Inducing classifiers of a satisfactory quality is the problem studied intensively in rough set theory, see, e.g., [31], where three main kinds of classifiers are distinguished: minimal, i.e., consisting of minimum possible number of descriptors describing decision classes in the universe, exhaustive, i.e., consisting of all possible rules, satisfactory, i.e., containing rules tailored to a specific use. Classifiers are evaluated globally with respect to their ability to properly classify objects, usually by error which is the ratio of the number of correctly classified objects to the number of test objects, total accuracy being the ratio of the number of correctly classified cases to the number of recognized cases, and total coverage, i.e, the ratio of the number of recognized test cases to the number of test cases. Minimum size algorithms include LEM2 algorithm by Grzymala Busse, see, e.g., [5], [3] and covering algorithm in RSES package [28]; exhaustive algorithms include, e.g., LERS system due to Grzymala Busse [4], systems based on discernibility matrices and Boolean reasoning according to Skowron, see, e.g., [26],[27], [1], [25], implemented in the RSES package [28]. Minimal consistent sets of rules were introduced in [29]. Further developments include dynamic rules, approximate rules, local rules, and relevant rules [1]. Rough set based classification algorithms, especially those implemented in the RSES system [28], were discussed extensively in [2]. An important class of methods for classifier induction are those based on similarity or analogy reasoning; most generally, this method of reasoning assigns to an object u the value of an attribute a from the knowledge of values of a on a set N(u) of objects whose elements are selected on the basis of a similarity relation, usually but not always based on an appropriate metric. An extensive and deep study of algorithms based on similarity relations is [25]. A realization of analogy based reasoning idea is, e.g., the k nearest neighbors (k-nn) method, see, e.g, [7], in which for a fixed number k, and a given test object u, the value a(u) is assigned from values of a at k nearest to u objects in the training set. Finding nearest objects is based on some similarity measure among objects that in practice is a metric. An extensive study of this topic is given in [34]. II. GRANULES OF KNOWLEDGE In addition to traditional granules based on indiscernibility of objects, new forms of granules have been searched for; in many works, see, e.g., e.g., [11], [24], [30], [35], [36], granulation based on similarity relations and in general on binary relations was studied along with applications to concept approximation. Our approach is based on the method proposed and studied in [21], [15], [16], [17], [18] which employs ideas of mereology and uses as the main tool rough inclusions. We briefly indicate the main ideas and facts relevant to this approach. A. Rough inclusions A rough inclusion is a relation µ U U [0, 1] which satisfies the following requirements, relative to a given irreflexive and transitive part relation π on a set U, 1. µ(x, y, 1) x π y x = y; 2. µ(x, y, 1) [µ(z, x, r) µ(z, y, r)]; 3. µ(x, y, r) s < r µ(x, y, s). Condition 1 states that on U an exact decomposition into parts π is given and that µ extends this exact scheme into an approximate one; the exact scheme is a skeleton along which approximate reasoning is carried out. We apply here the rough inclusion µ L induced by the t norm L(x, y) = max{0, x + y 1} due to Jan Łukasiewicz, see, e.g., [6] for a discussion of t norms.this t norm admits a functional characterization, (1) L(x, y) = g L (f L (x) + f L (y)), (2) where f L (x) = 1 x = g L (x), see, e.g., [14]. In order to define µ L, we let, µ L (u, v, r) g L ( DIS(u, v) ) r, (3) where DIS(u, v) = {a A : a(u) a(v)}, and its complement IND(u, v) = U U \ DIS(u, v). For the Łukasiewicz t norm, µ L is of the form, or dually, µ L (u, v, r) 1 µ L (u, v, r) DIS(u, v) IND(u, v) r, (4) r. (5)

B. Granules based on rough inclusions The class operator ClsF, informally, forms for a property F of objects, the list of all objects in U that have the property F ; a formal description is given in the companion paper [19]. For an object u and a real number r [0, 1], we define the granule g µ (u, r) about u of the radius r, relative to µ, by letting, g µ (u, r) is ClsF (u, r), (6) where the property F (u, r) is satisfied with an object v if and only if µ(v, u, r) holds. It was shown, see,e.g., [15], Theorem 4, that in case of µ L, v ing g µl (u, r) µ L (v, u, r). (7) Property (7) allows for representing the granule g µl (u, r) as the list of those v that µ L (v, u, r). Granules in this work are defined with respect to the rough inclusion µ L and thus the granule g µl (u, r) consists of objects v for which IND(u, v) r, i.e., at least the fraction r of attributes agree on u and v; one may regard this measure as the reduced Hamming distance on objects in U. III. GRANULAR DECISION SYSTEMS The idea of a granular decision system was posed in [17]; for a given information system (U, A), a rough inclusion µ, and r [0, 1], the new universe U G r,µ is given. We apply a strategy G to choose a covering Cov G r,µ of the universe U by granules from U G r,µ. We apply a strategy S in order to assign the value a (g) of each attribute a A to each granule g Cov G r,µ: a (g) = S({a(u) : u g}). The granular counterpart to the information system (U, A) is a tuple (U G r,µ, G, S, {a : a A}); analogously, we define granular counterparts to decision systems by adding the factored decision d. The heuristic principle that objects, similar with respect to conditional attributes in the set A, should also reveal similar (i.e., close) decision values, and therefore, granular counterparts to decision systems should lead to classifiers satisfactorily close in quality to those induced from original decision systems, was stated in [17]. IV. SELECTED TEST RESULTS For space considerations, only few tests can be included here. We have chosen the simplest possible rough inclusion µ L without any enhancements. For any granule g and any attribute b in the set A d of attributes, the reduced attribute s b value at the granule g has been TABLE I HEART DATASET:R=GRANULE RADIUS,TST=TEST SAMPLE SIZE,TRN=TRAINING SAMPLE SIZE,RULEX=NUMBER OF RULES WITH EXHAUSTIVE ALGORITHM, RULLEM=NUMBER OF RULES WITH LEM2, AEX=TOTAL ACCURACY WITH EXHAUSTIVE ALGORITHM,CEX=TOTAL COVERAGE WITH EXHAUSTIVE ALGORITHM,ALEM=TOTAL ACCURACY WITH LEM2, CLEM=TOTAL COVERAGE WITH LEM2 r tst trn rulex rullem aex cex alem clem nil 270 270 5352 42 1.0 1.0 1.0 0.504 0.0 270 1 0 0 0.0 0.0 0.0 0.0 0.0769231 270 1 0 0 0.0 0.0 0.0 0.0 0.153846 270 1 0 0 0.0 0.0 0.0 0.0 0.230769 270 4 0 1 0.0 0.0 1.0 0.307 0.307692 270 6 28 2 0.727 0.963 0.4 0.019 0.384615 270 14 75 2 0.770 0.996 0.855 0.204 0.461538 270 23 131 3 0.778 1.0 0.848 0.341 0.538462 270 132 2231 9 0.896 1.0 0.896 0.496 0.615385 270 132 2114 12 0.885 1.0 0.885 0.485 0.692308 270 215 4389 29 0.970 1.0 0.945 0.541 0.769231 270 262 5220 41 1.0 1.0 1.0 0.500 0.846154 270 270 5352 42 1.0 1.0 1.0 0.504 0.923077 270 270 5352 42 1.0 1.0 1.0 0.504 estimated by means of the majority voting strategy and ties have been resolved at random; we select coverings by random choice of granules. As well established algorithms for classifier induction, we select the RSES exhaustive algorithm, see [28]; LEM2 algorithm, with p=.5, see [5], [28]. A. Train and test 1:1 with Heart Disease data set (Cleveland) B. Heart disease data set Table I gives results of experiments with Heart disease data set (Cleveland). The procedure applied has been train and test in the ratio 1:1, i.e., the rules have been trained on 50 percent of data and tested on the remaining 50 percent. The training set has been granulated at all distinct radii and granular systems have been formed by means of random choice of coverings. Rules induced by either exhaustive or LEM2 algorithms on the granulated training set have been tested on the test set and results have been compared with classification results given on the test set by rule sets induced from non granulated training set. The strategy S applied in determining attribute values on granules has been the majority voting with random resolution of ties. 1) Conclusions for Heart data sets: In case of exhaustive algorithm, accuracy falls within 0.27 (27 percent of the value with original data set), and coverage within 0.037 of values for original data set at the radius of 0.307692, where object size reduction is 97.8 percent and rule set size reduction is 99.5 percent. Accuracy falls within error of 11.5 percent of the original value from the radius of 0.538462 on, where reduction in object set size is 51.2 percent and reduction in rule set size is 58.3 percent; accuracy error is less than 3 percent

TABLE II 10-FOLD CV; PIMA; EXHAUSTIVE ALGORITHM. R=RADIUS,MACC=MEAN ACCURACY,MCOV=MEAN COVERAGE,MRULES=MEAN RULE NUMBER, MTRN=MEAN SIZE OF TRAINING SET nil 0.6864 0.9987 7629 692 0.125 0.0618 0.0895 5.9 22.5 0.250 0.6627 0.9948 450.1 120.6 0.375 0.6536 0.9987 3593.6 358.7 0.500 0.6645 1.0 6517.6 579.4 0.625 0.6877 0.9987 7583.6 683.1 0.750 0.6864 0.9987 7629.2 692 0.875 0.6864 0.9987 7629.2 692 TABLE III 10-FOLD CV; PIMA; LEM2 ALGORITHM. R=RADIUS,MACC=MEAN ACCURACY,MCOV=MEAN COVERAGE,MRULES=MEAN RULE NUMBER, MTRN=MEAN SIZE OF TRAINING SET nil 0.7054 0.1644 227.0 692 0.125 0.900 0.2172 1.0 22.5 0.250 0.7001 0.1250 12.0 120.6 0.375 0.6884 0.2935 74.7 358.7 0.500 0.7334 0.1856 176.1 579.4 0.625 0.7093 0.1711 223.1 683.1 0.750 0.7071 0.1671 225.9 692 0.875 0.7213 0.1712 227.8 692 from r = 0.692 on, with maximal coverage of 1.0, when reduction in object number is 20.4 percent and in rule size 18 percent. LEM2 algorithm achieves with granular systems error in accuracy less than 0.115 (11.5 percent) and error in coverage less than 0.02 (0.4 percent) from the radius of 0.538462 on, with reduction in object size of 51 percent and reduction of rule set size of 78 percent. C. 10 fold cross validation with Pima Indians Diabetes data set A parallel study has been performed on Pima Indians Diabetes data set [33] and the test has been carried out with 10-fold cross-validation [7]. Results are reported in Tables II and III. 1) Conclusions for CV-10 on Pima Indians Diabetes data set: For exhaustive algorithm, accuracy in granular case is 95.4 percent of accuracy in non granular case, from the radius of.25 with reduction in size of the training set of 48 percent, and from the radius of.5 on, the difference is less than 3 percent. The difference in coverage is less than.4 percent from r =.25 on, where reduction in training set size is 82.5 percent. For LEM2, accuracy in both cases differs by less than 1 percent from r =.25 on, and it is better in granular case from r =.125 on; coverage is better in granular case from r =.375 on. V. MISSING VALUES An information/decision system is incomplete in case some values of conditional attributes from A are not known. Analysis of systems with missing values requires a decision on how to treat missing values; Grzymala Busse in his work [5], analyzes nine such methods, among them, 4. assigning all possible values to the missing location, 9. treating the unknown value as a new valid value, etc. etc. Results in [5] indicate that methods 4,9 perform very well among all nine methods. In this work we consider and adopt two methods, i.e.4, 9. We will use the symbol commonly used for denoting the missing value; we will use two methods 4, 9 for treating, i.e, either is a don t care symbol meaning that any value of the respective attribute can be substituted for, thus = v for each value v of the attribute, or is a new value on its own, i.e., if = v then v can be only. Our procedure for treating missing values is based on the granular structure (U G r,µ, G, S, {a : a A}); the strategy S is the majority voting, i.e., for each attribute a, the value a (g) is the most frequent of values in {a(u) : u g}. The strategy G consists in random selection of granules for a covering. For an object u with the value of at an attribute a,, and a granule g = g(v, r) U G r,µ, the question whether u is included in g is resolved according to the adopted strategy of treating : in case = don t care, the value of is regarded as identical with any value of a hence IND(u, v) is automatically increased by 1, which increases the granule; in case =, the granule size is decreased. Assuming that is sparse in data, majority voting on g would produce values of a distinct from in most cases; nevertheless the value of may appear in new objects g, and then in the process of classification, such value is repaired by means of the granule closest to g with respect to the rough inclusion µ L, in accordance with the chosen method for treating. In plain words, objects with missing values are in a sense absorbed by close to them granules and missing values are replaced with most frequent values in objects collected in the granule; in this way the method 3 or 4 in [5] is combined with the idea of a frequent value, in a novel way. We have thus four possible strategies: Strategy A: in building granules =don t care, in repairing values of, =don t care; Strategy B: in building granules =don t care, in repairing values of, = ; Strategy C: in building granules =, in repairing values of, =don t care; Strategy D: in building granules =, in repairing values of, =.

TABLE IV STRATEGY A FOR MISSING VALUES. 10-FOLD CV; PIMA; EXHAUSTIVE ALGORITHM. R=RADIUS, MACC=MEAN ACCURACY, MCOV=MEAN COVERAGE, MRULES=MEAN RULE NUMBER, MTRN=MEAN SIZE OF GRANULAR TRAINING SET AS FRACTION OF THE ORIGINAL TRAINING SET TABLE VII STRATEGY D FOR MISSING VALUES. 10-FOLD CV; PIMA; EXHAUSTIVE ALGORITHM. R=RADIUS, MACC=MEAN ACCURACY, MCOV=MEAN COVERAGE, MRULES=MEAN RULE NUMBER, MTRN=MEAN SIZE OF GRANULAR TRAINING SET AS FRACTION OF THE ORIGINAL TRAINING SET nil 0.6864 0.9987 7629.2 692.0 0.125 0.0 0.0 0.0 1.7 0.250 0.0 0.0 0.0 4.7 0.375 0.0 0.0 0.0 21.5 0.500 0.3179 0.4777 115.8 64.7 0.625 0.6692 0.9987 1654.7 220.2 0.750 0.6697 1.0 5519.3 527.0 0.875 0.6678 0.9987 7078.8 663.8 nil 0.6864 0.9987 7629.2 692.0 0.125 0.1471 0.1750 12.0 17.3 0.250 0.6572 0.9974 382.1 114.9 0.375 0.6491 0.9974 3400.3 355.0 0.500 0.6370 0.9974 6300.2 588.7 0.625 0.6747 0.9987 7181.2 682.3 0.750 0.6724 1.0 7231.3 691.9 0.875 0.6618 1.0 7253.6 692.0 TABLE V STRATEGY B FOR MISSING VALUES. 10-FOLD CV; PIMA; EXHAUSTIVE ALGORITHM. R=RADIUS,MACC=MEAN ACCURACY, MCOV=MEAN COVERAGE, MRULES=MEAN RULE NUMBER, MTRN=MEAN SIZE OF GRANULAR TRAINING SET AS FRACTION OF THE ORIGINAL TRAINING SET TABLE VIII AVERAGE NUMBER OF VALUES IN GRANULAR SYSTEMS. 10-FOLD CV; PIMA; EXHAUSTIVE ALGORITHM. R=RADIUS,MA=MEAN VALUE FOR A, MB=MEAN VALUE FOR B, MC=MEAN VALUE FOR C, MD=MEAN VALUE FOR D nil 0.6864 0.9987 7629.2 692.0 0.125 0.0 0.0 0.0 1.9 0.250 0.0 0.0 0.0 6.1 0.375 0.0 0.0 0.0 13.7 0.500 0.5772 0.8883 210.7 68.1 0.625 0.6467 0.9987 1785.8 229.4 0.750 0.6587 0.9987 5350.4 508.5 0.875 0.6547 0.9987 6982.7 663.4 A. Results of tests with perturbed data set We record in Tables IV VII, the test results with Pima Indians Diabetes data set [33] in which 10 percent of attribute values chosen at random have been replaced with the value of. Exhaustive algorithm of RSES system [28] has been used as the rule inducing algorithm; 10 fold cross validation (CV 10), see, e.g., [7] has been applied in testing. 1) Conclusions on test results: In case of perturbed Pima Indians diabetes data set, Strategy A attains accuracy value better than 97 percent and coverage value greater or equal to values in non perturbed case from the radius of.625 on. With Strategy B, accuracy is within 94 percent and coverage not smaller than values in non perturbed case from the radius of.625 on. Strategy C yields accuracy within 96.3 percent of accuracy in non perturbed case from the radius of.625, and within 95 TABLE VI STRATEGY C FOR MISSING VALUES. 10-FOLD CV; PIMA; EXHAUSTIVE ALGORITHM. R=RADIUS,MACC=MEAN ACCURACY, MCOV=MEAN COVERAGE, MRULES=MEAN RULE NUMBER, MTRN=MEAN SIZE OF GRANULAR TRAINING SET AS FRACTION OF THE ORIGINAL TRAINING SET nil 0.6864 0.9987 7629.2 692.0 0.125 0.0 0.0 0.0 21.2 0.250 0.6297 0.9948 388.9 116.9 0.375 0.6556 0.9974 3328.5 356.5 0.500 0.6433 1.0 6396.7 587.2 0.625 0.6621 1.0 7213.2 681.9 0.750 0.6640 0.9987 7306.3 691.9 0.875 0.6615 0.9987 7232.1 692.0 r ma mb mc md 0.375 0.0 0.0 135 132 0.500 0.0 0.0 412 412 0.625 3 4 538 539 0.750 167 167 554 554 0.875 435 435 554 554 percent from the radius of.250; coverage is within 99.79 percent from the radius of.250. Strategy D gives results slightly better than C with the same radii. Results for C and D are better than results for A or B. We conclude that essential for results of classification is the strategy of treating the missing value of as = in both strategies C and D; the repairing strategy has almost no effect: C and D differ with respect to this strategy but results for accuracy and coverage in cases C and D differ very slightly. Let us notice that strategies C and D cope with a larger number of values to be repaired as Table VIII shows. B. Results of test with real data set Hepatitis with missing values We record here results of tests with Hepatitis data set [33] with 155 objects, 20 attributes and 167 missing values. We apply the exhaustive algorithm of RSES system [28] and 5 fold cross validation (CV 5). Below we give averaged results for strategies A, B, C, and D. As before, radius nil indicates non granulated case. First, we record the number of missing values that have fallen in training and test sets, respectively, in Table IX. Next, we record in Table X the average number of values that fall into granulated data set (i.e., no. of to be repaired) depending on the strategy applied. Now, we record in Tables XI XIV the results of classification for Hepatitis with exhaustive algorithm and CV 5 cross validation for strategies A, B, C, D.

TABLE IX AVERAGE NUMBER OF VALUES IN TRAINING AS WELL AS TEST SET. CV 5; HEPATITIS; EXHAUSTIVE ALGORITHM. FN=FOLD NO., TST-NIL=NO. IN TEST SET, TRN-NIL=NO. IN TABLE XII STRATEGY B. CV 5; HEPATITIS; EXHAUSTIVE ALGORITHM. R=RADIUS,MACC=MEAN ACCURACY, MCOV=MEAN COVERAGE, MRUL=MEAN NUMBER OF RULES, MTRN=MEAN TRAINING GRANULAR SAMPLE SIZE fn tst nil trn nil 1 37 130 2 33 134 3 42 125 4 27 140 5 28 139 TRAINING SET TABLE X AVERAGE NUMBER OF VALUES IN GRANULAR SYSTEMS. CV 5; HEPATITIS; EXHAUSTIVE ALGORITHM. R=RADIUS,MA=MEAN VALUE FOR A, MB=MEAN VALUE FOR B, MC=MEAN VALUE r macc mcov mrul mtrn 0.0526316 0.0 0.0 0.0 1.0 0.105263 0.0 0.0 0.0 1.0 0.157895 0.0 0.0 0.0 1.0 0.210526 0.0 0.0 0.0 1.2 0.263158 0.0 0.0 0.0 1.2 0.315789 0.0 0.0 0.0 1.6 0.368421 0.1104 0.1870 1.0 2.6 0.421053 0.0904 0.2000 1.6 3.4 0.473684 0.3938 0.5806 7.2 4.4 0.526316 0.4234 0.7936 26.2 7.6 0.578947 0.6302 0.9936 59.4 10.8 0.631579 0.6708 1.0 126.4 15.4 0.684211 0.6038 0.9742 253.4 24.4 0.736842 0.6292 0.9936 367.6 35.2 0.789474 0.6166 0.9936 947.0 52.2 0.842105 0.6324 1.0 1417.2 71.8 0.894737 0.6386 1.0 1797.0 79.6 0.947368 0.6450 1.0 3081.8 113.4 1.0 0.6646 1.0 3354.2 123.4 FOR C, MD=MEAN VALUE FOR D r ma mb mc md 0.0 0.0 0.0 0.0 0.0 0.0526316 0.0 0.0 0.0 0.0 0.105263 0.0 0.0 0.0 0.0 0.157895 0.0 0.0 0.0 0.0 0.210526 0.0 0.0 0.0 0.0 0.263158 0.0 0.0 0.0 0.0 0.315789 0.0 0.0 0.0 0.0 0.368421 0.0 0.0 0.0 0.0 0.421053 0.0 0.0 0.0 0.0 0.473684 0.0 0.0 6.2 2.6 0.526316 0.0 0.0 5.2 9.4 0.578947 0.0 0.0 27.2 27.8 0.631579 0.2 0.0 40.4 41.4 0.684211 0.2 0.2 58.2 58.4 0.736842 1.6 3.0 82.2 83.0 0.789474 13.0 15.4 106.4 104.8 0.842105 26.6 42.2 124.0 123.8 0.894737 51.8 51.4 133.0 133.0 0.947368 93.2 99.6 133.6 133.6 1.0 125.2 125.2 133.6 133.6 nil = 133.6 TABLE XI STRATEGY A. CV 5; HEPATITIS; EXHAUSTIVE ALGORITHM. R=RADIUS,MACC=MEAN ACCURACY, MCOV=MEAN COVERAGE, MRUL=MEAN NUMBER OF RULES, MTRN=MEAN TRAINING GRANULAR SAMPLE SIZE r macc mcov mrul mtrn 0.0526316 0.0 0.0 0.0 1.0 0.105263 0.0 0.0 0.0 1.0 0.157895 0.0 0.0 0.0 1.0 0.210526 0.0 0.0 0.0 1.0 0.263158 0.0 0.0 0.0 1.4 0.315789 0.0 0.0 0.0 2.0 0.368421 0.0 0.0 0.0 2.4 0.421053 0.0 0.0 0.0 3.8 0.473684 0.2012 0.3548 6.4 3.4 0.526316 0.5934 1.0 29.6 7.4 0.578947 0.4992 0.7872 33.8 7.6 0.631579 0.5694 0.9872 176.2 20.0 0.684211 0.5852 0.9936 167.6 17.8 0.736842 0.6102 0.9936 263.0 22.8 0.789474 0.6130 1.0 911.0 49.4 0.842105 0.6258 1.0 989.6 46.8 0.894737 0.6386 1.0 1899.0 77.0 0.947368 0.6774 1.0 2836.2 105.8 1.0 0.6710 1.0 3286.4 123.4 TABLE XIII STRATEGY C. CV 5; HEPATITIS; EXHAUSTIVE ALGORITHM. R=RADIUS,MACC=MEAN ACCURACY, MCOV=MEAN COVERAGE, MRUL=MEAN NUMBER OF RULES, MTRN=MEAN TRAINING GRANULAR SAMPLE SIZE r macc mcov mrul mtrn 0.0526316 0.0 0.0 0.0 1.0 0.105263 0.0 0.0 0.0 1.2 0.157895 0.0 0.0 0.0 1.2 0.210526 0.0 0.0 0.0 1.8 0.263158 0.0 0.0 0.0 2.0 0.315789 0.2560 0.3936 2.4 4.0 0.368421 0.4486 0.6838 7.4 5.6 0.421053 0.4766 0.7870 19.2 7.8 0.473684 0.5806 1.0 58.4 10.6 0.526316 0.6580 1.0 136.6 17.4 0.578947 0.64902 0.9936 332.4 32.0 0.631579 0.6568 0.9936 991.6 47.4 0.684211 0.6646 1.0 1751.6 70.2 0.736842 0.6902 1.0 2648.8 93.2 0.789474 0.6322 1.0 3208.8 112.6 0.842105 0.6776 1.0 3297.8 120.2 0.894737 0.6710 1.0 3297.4 123.4 0.947368 0.6838 1.0 3305.4 124.0 1.0 0.6774 1.0 3327.2 124.0 TABLE XIV STRATEGY D. CV 5; HEPATITIS; EXHAUSTIVE ALGORITHM. R=RADIUS,MACC=MEAN ACCURACY, MCOV=MEAN COVERAGE, MRUL=MEAN NUMBER OF RULES, MTRN=MEAN TRAINING GRANULAR SAMPLE SIZE r macc mcov mrul mtrn 0.0526316 0.0 0.0 0.0 1.0 0.105263 0.0 0.0 0.0 1.0 0.157895 0.0 0.0 0.0 1.4 0.210526 0.0 0.0 0.0 1.6 0.263158 0.0 0.0 0.0 2.6 0.315789 0.3886 0.5162 6.0 3.8 0.368421 0.5730 0.9032 16.6 4.8 0.421053 0.6328 0.9418 23.8 6.8 0.473684 0.5740 0.9740 60.6 10.6 0.526316 0.6170 0.9936 120.6 16.8 0.578947 0.6888 0.9936 354.0 30.6 0.631579 0.6388 1.0 922.0 47.4 0.684211 0.6646 1.0 1828.6 70.8 0.736842 0.6450 1.0 2648.2 93.4 0.789474 0.6516 1.0 3182.0 112.4 0.842105 0.6710 1.0 3299.2 120.4 0.894737 0.6710 1.0 3333.8 123.4 0.947368 0.6646 1.0 3327.2 124.0 1.0 0.6710 1.0 3338.6 124.0

1) Conclusions for Hepatitis data set: Results for particular strategies compared radius by radius show that the ranking of strategies is C > D > B > A; thus, the strategy C is most effective with D giving slightly worse results. As with perturbed Pima Indians Diabetes set, strategies C and D cope with a larger number of values in the test set. In [3], Hepatitis data set was studied with naive LERS algorithm and with new LERS algorithm augmented with parameters like strength and specificity of rules; the results in case of method 9, were accuracy of 0.8065 for new LERS and 0. 6516 for naive LERS; the best result obtained by our approach with the strategy C implementing method 9 is 0. 6838, i.e. it falls almost in the middle between the two results for LERS. VI. CONCLUSION The results of tests reported in this work bear out the hypothesis that granulated data sets preserve information allowing for satisfactory classification. Also the novel approach to the problem of data with missing values has proved to be very effective. Further studies will lead to novel algorithms for rule induction based on granules of knowledge. VII. ACKNOWLEDGEMENT The authors acknowledge the service rendered rough set community by Professor Skowron and Grzymala Busse by sharing their algorithms to be used in this work. REFERENCES [1] J. G. Bazan, A comparison of dynamic and non dynamic rough set methods for extracting laws from decision tables, In: Rough Sets in Knowledge Discovery 1, L. Polkowski and A.Skowron, Eds., Physica Verlag: Heidelberg, 1998, 321 365. [2] J. G. Bazan, Hung Son Nguyen, Sinh Hoa Nguyen, P. Synak and J. Wróblewski, Rough set algorithms in classification problems, In: Rough Set Methods and Applications, L.Polkowski, S.Tsumoto and T.Y.Lin Eds., Physica Verlag: Heidelberg, 2000, 49 88. [3] J.W. Grzymala Busse and Ming Hu, A comparison of several approaches to missing attribute values in Data Mining, In: Proceedings RSCTC 2000; LNAI 2005, Springer Verlag: Berlin, 2000, 378 385. [4] J.W. Grzymala Busse, LERS a system for learning from examples based on rough sets, In: Intelligent Decision Support: Handbook of Advances and Applications of the Rough Sets Theory, R. Słowiński Ed., Kluwer: Dordrecht, 1992, 3 18. [5] J.W. Grzymala Busse, Data with missing attribute values: Generalization of rule indiscernibility relation and rule induction, Transactions on Rough Sets I, Springer Verlag: Berlin, 2004, 78 95. [6] P. Hájek, Metamathematics of Fuzzy Logic, Kluwer: Dordrecht, 1998. [7] T. Hastie, R. Tibshirani and J.Friedman, The Elements of Statistical Learning, Springer Verlag: New York, 2003. [8] M. Kryszkiewicz, Rules in incomplete information systems, Information Sciences 113, 1999, 271 292. [9] M. Kryszkiewicz and H. Rybiński, Data mining in incomplete information systems from rough set perspective, in: Rough Set Methods and Applications, L.Polkowski, S.Tsumoto and T.Y.Lin, Eds., Physica Verlag: Heidelberg, 2000, 568 580. [10] S. Leśniewski, On the foundations of set theory, Topoi 2, 1982, 7 52. [11] T. Y. Lin, Granular computing: Examples, Intuitions, and Modeling, in: [22], 40 44. [12] Rough Neural Computing. Techniques for Computing with Words, S. K. Pal, L. Polkowski and A. Skowron Eds., Springer Verlag: Berlin, 2004. [13] Z. Pawlak, Rough Sets: Theoretical Aspects of Reasoning about Data, Kluwer: Dordrecht, 1991. [14] L. Polkowski, Rough Sets. Mathematical Foundations, Physica Verlag: Heidelberg, 2002. [15] L. Polkowski, Toward rough set foundations. Mereological approach (a plenary lecture), In: Proceedings RSCTC04, Uppsala, Sweden, 2004, LNAI vol. 3066,Springer Verlag: Berlin, 2004, 8 25. [16] L. Polkowski, Rough fuzzy neurocomputing based on rough mereological calculus of granules, Intern. J. Hybrid Intell. Systems 2, 2005, 91 108. [17] L. Polkowski, Formal granular calculi based on rough inclusions (a feature talk), In: [22], 57 62. [18] L.Polkowski, A model of granular computing with applications (a feature talk), in: [23], 9 16. [19] L.Polkowski, The paradigm of granular computing:foundations and Applications, in these Proceedings. [20] L.Polkowski and A. Skowron, Rough mereology: a new paradigm for approximate reasoning, International Journal of Approximate Reasoning 15(4), 1997, 333 365. [21] L.Polkowski and A. Skowron, Rough mereology: a new paradigm for approximate reasoning,international Journal of Approximate Reasoning 15(4), 1997, 333 365. [22] Proceedings of IEEE 2005 Conference on Granular Computing,GrC05, Beijing, China, July 2005, IEEE Press, 2005. [23] Proceedings of IEEE 2006 Conference on Granular Computing, GrC06, Atlanta, USA, May 2006, IEEE Press, 2006. [24] Qing Liu and Hui Sun, Theoretical study of granular computing, In: Proceedings RSKT06, Chongqing, China, 2006; Lecture Notes in Artificial Intelligence 4062, Springer Verlag: Berlin, 2006, 92 102. [25] Sinh Hoa Nguyen, Regularity analysis and its applications in Data Mining, in: Rough Set Methods and Applications, L.Polkowski, S.Tsumoto and T.Y.Lin, Eds., Physica Verlag: Heidelberg, 2000, 289 378. [26] A. Skowron, Boolean reasoning for decision rules generation, In: Methodologies for Intelligent Systems,

J.Komorowski and Z. Ras Eds., LNAI 689, Springer Verlag: Berlin, 1993, 295 305. [27] A. Skowron, Extracting laws from decision tables, Computational Intelligence. An International Journal 11(2), 1995, 371 388. [28] A. Skowron et al., RSES: A system for data analysis ; available at http: logic.mimuw.edu.pl/ rses/ [29] A. Skowron and C. Rauszer, The discernibility matrices and functions in decision systems, In: Intelligent Decision Support. Handbook of Applications and Advances of the Rough Sets Theory, R. Słowiński Ed., Kluwer: Dordrecht, 1992, 311 362. [30] A. Skowron and J. Stepaniuk, Information granules and rough neural computing, in:[12], 43 84. [31] J. Stefanowski, On rough set based approaches to induction of decision rules, In: Rough Sets in Knowledge Discovery 1, L. Polkowski and A.Skowron Eds., Physica Verlag: Heidelberg, 1998, 500 529. [32] J. Stefanowski and A. Tsoukias, Incomplete information tables and rough classification, Computational Intelligence 17, 2001, 545 566. [33] http://www.ics.uci.edu./ mlearn/databases/ [34] A. Wojna, Analogy based reasoning in classifier construction, Transactions on Rough Sets IV, subseries of Lecture Notes in Computer Science, LNCS 3700, Springer Verlag, Berlin, 2005, 277 374. [35] Y. Y. Yao, Information granulation and approximation in a decision theoretic model of rough sets, In: [12], 491 516. [36] Y.Y. Yao, Perspectives of granular computing, In: [22], 85 90.