Andrzej Skowron, Zbigniew Suraj (Eds.) To the Memory of Professor Zdzisław Pawlak

Size: px
Start display at page:

Download "Andrzej Skowron, Zbigniew Suraj (Eds.) To the Memory of Professor Zdzisław Pawlak"

Transcription

1 Andrzej Skowron, Zbigniew Suraj (Eds.) ROUGH SETS AND INTELLIGENT SYSTEMS To the Memory of Professor Zdzisław Pawlak Vol. 1 SPIN Springer s internal project number, if known Springer Berlin Heidelberg NewYork

2 Chapter 3 Rough Sets: From Rudiments to Challenges Hung Son Nguyen and Andrzej Skowron Abstract In the development of rough set theory and applications, one can distinguish three main stages. At the beginning, the researchers were concentrated on descriptive properties such as reducts of information systems preserving indiscernibility relations or description of concepts or classifications. Next, they moved to applications of rough sets in machine learning, pattern recognition, and data mining. After gaining some experiences, they developed foundations for inductive reasoning leading to, e.g., inducing classifiers. While the first period was based on the assumption that objects are perceived by means of partial information represented by attributes, in the second period it was also used the assumption that information about the approximated concepts is partial too. Approximation spaces and searching strategies for relevant approximation spaces were recognized as the basic tools for rough sets. Important achievements both in theory and applications were obtained using Boolean reasoning and approximate Boolean reasoning applied, e.g., in searching for relevant features, discretization, symbolic value grouping, or, in more general sense, in searching for relevant approximation spaces. Nowadays, we observe that a new period is emerging in which two new important topics are investigated: (i) strategies for discovering relevant (complex) contexts of analyzed objects or granules, what is strongly related to information granulation process and granular computing, and (ii) interactive computations on granules. Both directions are aiming at developing tools for approximation of complex vague concepts such as behavioral patterns or adaptive strategies making it possible to achieve the satisfactory qualities of realized interactive computations. This chapter presents this development from rudiments of rough sets to challenges, e.g., related to ontology approximation, process mining, context inducing or Perception Based Computing (PBC). The approach is based on Interactive Rough Granular Computing (IRGC). Keywords: vague concept, indiscernibility, reduct, approximation space, rough sets, decision rule, dependency, (approximate) Boolean reasoning and rough sets, concept approximation, ontology approximation, scalability in data mining, (interactive rough) granular computing, context inducing, process mining, perception based computing. Hung Son Nguyen Andrzej Skowron Institute of Mathematics, Warsaw University, Banacha 2, Warsaw, Poland, skowron@mimuw.edu.pl 75

3 76 Hung Son Nguyen and Andrzej Skowron 3.1 Introduction Rough set theory, proposed by Professor Zdzisław Pawlak in 1982 [209, 211, 212, 215], can be seen as a new mathematical approach for solving problems dealing with imperfect data and knowledge, or in particular, with vague concepts. The rough set philosophy is founded on the assumption that with every object of the universe of discourse we associate some information (data, knowledge). For example, if objects are patients suffering from a certain disease, symptoms of the disease form information about patients. Objects characterized by the same information are indiscernible (similar) in view of the available information about them. The indiscernibility relation generated in this way is the mathematical basis of rough set theory. This understanding of indiscernibility is related to the idea of Gottfried Wilhelm Leibniz that objects are indiscernible if and only if all available functionals take on identical values (Leibniz s Law of Indiscernibility: The Identity of Indiscernibles) [5, 127]. However, in the rough set approach, indiscernibility is defined relative to a given set of functionals (attributes). Any set of all indiscernible (similar) objects is called an elementary set, and forms a basic granule (atom) of knowledge about the universe. Any union of elementary sets is referred to as a crisp (precise) set 1. A set which is not crisp is called rough (imprecise, vague). Consequently, each rough set has boundary region cases, i.e., objects which cannot with certainty be classified either as members of the set or of its complement. Obviously crisp sets have no boundary region elements at all. This means that boundary region cases cannot be properly classified by employing available knowledge. Thus, the assumption that objects can be seen only through the information available about them leads to the view that knowledge has a granular structure. Due to the granularity of knowledge, some objects of interest cannot be discerned and appear as the same (or similar). As a consequence, vague concepts, in contrast to precise concepts, cannot be characterized in terms of information about their elements. Therefore, in the proposed approach, we assume that any vague concept is replaced by a pair of precise concepts called the lower and the upper approximation of the vague concept. The lower approximation consists of all objects which surely belong to the concept and the upper approximation contains all objects which possibly belong to the concept. The difference between the upper and the lower approximation constitutes the boundary region of the vague concept. These approximations are two basic operations in rough set theory. Note, that the boundary region is defined relative to a subjective knowledge given by a set of attributes or/and sample of objects. Such a boundary region is crisp. However, when some attributes are deleted, new attributes are added or a given sample is updated the boundary region is changing. One could ask about a boundary region independent of such subjective knowledge but then, in the discussed framework, we do not have a possibility to define such re- 1 This approach is generalized when one considers inductive extensions of approximations from samples of objects (see, e.g., [303]).

4 3 Rough Sets: From Rudiments to Challenges 77 gion as a crisp set. This property is related to the higher order vagueness discussed in philosophy. Hence, rough set theory expresses vagueness not by means of membership, but by employing a boundary region of a set. If the boundary region of a set is empty, it means that the set is crisp, otherwise the set is rough (inexact). A nonempty boundary region of a set means that our knowledge about the set is not sufficient to define the set precisely. Rough set theory it is not an alternative to but rather is embedded in classical set theory. Rough set theory can be viewed as a specific implementation of Frege s idea of vagueness [59], i.e., imprecision in this approach is expressed by a boundary region of a set. Rough set theory has attracted worldwide attention of many researchers and practitioners, who have contributed essentially to its development and applications. Rough set theory overlaps with many other theories. Despite this, rough set theory may be considered as an independent discipline in its own right. The rough set approach seems to be of fundamental importance in artificial intelligence and cognitive sciences, especially in research areas such as machine learning, intelligent systems, inductive reasoning, pattern recognition, mereology, image processing, signal analysis, knowledge discovery, decision analysis, and expert systems. The main advantage of rough set theory in data analysis is that it does not need any preliminary or additional information about data like probability distributions in statistics, basic probability assignments in Dempster Shafer theory, a grade of membership or the value of possibility in fuzzy set theory (see, e.g., [55] where some combinations of rough sets with non-parametric statistics are studied). One can observe the following about the rough set approach: introduction of efficient algorithms for finding hidden patterns in data, determination of optimal sets of data (data reduction), evaluation of the significance of data, generation of sets of decision rules from data, easy-to-understand formulation, straightforward interpretation of obtained results, suitability of many of its algorithms for parallel processing. The basic ideas of rough set theory and its extensions as well as many interesting applications can be found in a number of books (see, e.g., [43, 46, 52, 55, 83, 104, 118, 119, 137, 189, 199, 200, 215, 245, 249, 252, 253, 277, 324, 387, 109, 156, 38, 96, 223, 45, 29, 110, 198, 159, 248, 144, 224]), issues of the Transactions on Rough Sets [233, 231, 226, 227, 228, 232, 235, 229, 236, 241, 230, 238, 11, 237], special issues of other journals (see, e.g., [37, 133, 225, 197, 291, 326, 391, 392, 105, 191, 44, 292]), proceedings of international conferences (see, e.g., [2, 99, 136, 251, 290, 310, 320, 321, 353, 355, 356, 367, 390, 394, 72, 368, 4, 377, 123, 39, 366, 370, 272, 350, 384, 124, 378]), tutorials (see, e.g., [117, 222, 221, 220]). For more information on the bibliography on rough sets one can also visit web pages

5 78 Hung Son Nguyen and Andrzej Skowron In this chapter, we begin with a short discussion on vague concepts (see Section 3.2). Next, we recall the basic concepts of rough set theory (see Section 3.3). Some extensions of the rough set approach are outlined in Section 3.4. In Section 3.5, we discuss the relationship of the rough set approach with inductive reasoning. In particular, we present the rough set approach to inducing rough set based classifiers and inducing relevant approximation spaces. We also discuss shortly the relationship of the rough set approach and the higher order vagueness. Section 3.6 includes some remarks on relationships of information granulation and rough sets. Section 3.7, we outline the rough set approach to ontology approximation. The rough set approach based on combination of rough sets and boolean reasoning with applications in pattern recognition, machine learning, and data mining in presented in Section 3.8. In Section 3.9, we discuss some scalability issues using the rough set approach. Some comments on relationships of rough sets and logic are included in Section Finally, we discuss some challenging issues for rough sets (see Section 3.11). We propose Interactive (Rough) Granular Computing (IRGC) as a framework making it possible to search for solutions of problems related to inducing of relevant contexts, process mining and Perception Based Computing (PBC). This chapter is an extended version of our paper presented in the book Three Approaches to Data Analysis. Test Theory, Rough Sets and Logical Analysis of Data [40]. 3.2 Vague Concepts Mathematics requires that all mathematical notions (including set) must be exact, otherwise precise reasoning would be impossible. However, philosophers [111, 112, 266, 271] and recently computer scientists [145, 184, 186, 287] as well as other researchers have become interested in vague (imprecise) concepts. In classical set theory, a set is uniquely determined by its elements. In other words, this means that every element must be uniquely classified as belonging to the set or not. That is to say the notion of a set is a crisp (precise) one. For example, the set of odd numbers is crisp because every number is either odd or even. In contrast to odd numbers, the notion of a beautiful painting is vague, because we are unable to classify uniquely all paintings into two classes: beautiful and not beautiful. Some paintings cannot be decided whether they are beautiful or not and thus they remain in the doubtful area. Thus, beauty is not a precise but a vague concept. Almost all concepts we are using in natural language are vague. Therefore, common sense reasoning based on natural language must be based on vague concepts and not on classical logic. Interesting discussion of this issue can be found in [266]. The idea of vagueness can be traced back to the ancient Greek philosopher Eubulides of Megara (ca. 400BC) who first formulated so called sorites (heap) and falakros (bald man) paradoxes (see, e.g., [111, 112]). The bald man paradox goes as follows: suppose a man has 100,000 hairs on his head. Removing one hair from

6 3 Rough Sets: From Rudiments to Challenges 79 his head surely cannot make him bald. Repeating this step we arrive at the conclusion the a man without any hair is not bald. Similar reasoning can be applied to a hip of stones. Vagueness is usually associated with the boundary region approach (i.e., existence of objects which cannot be uniquely classified relative to a set or its complement) which was first formulated in 1893 by the father of modern logic, German logician, Gottlob Frege ( ) (see [59]). According to Frege the concept must have a sharp boundary. To the concept without a sharp boundary there would correspond an area that would not have any sharp boundary line all around. It means that mathematics must use crisp, not vague concepts, otherwise it would be impossible to reason precisely. Summing up, vagueness is not allowed in mathematics; interesting for philosophy; a nettlesome problem for natural language, cognitive science, artificial intelligence, machine learning, philosophy, and computer science. 3.3 Rudiments of Rough Sets This section briefly delineates basic concepts in rough set theory Indiscernibility and Approximation The starting point of rough set theory is the indiscernibility relation, which is generated by information about objects of interest. The indiscernibility relation expresses the fact that due to a lack of information (or knowledge) we are unable to discern some objects employing available information (or knowledge). This means that, in general, we are unable to deal with each particular object but we have to consider granules (clusters) of indiscernible objects as a fundamental basis for our theory. From a practical point of view, it is better to define basic concepts of this theory in terms of data. Therefore we will start our considerations from a data set called an information system. An information system is a data table containing rows labeled by objects of interest, columns labeled by attributes and entries of the table are attribute values. For example, a data table can describe a set of patients in a hospital. The patients can be characterized by some attributes, like age, sex, blood pressure, body temperature, etc. With every attribute a set of its values is associated, e.g., values of the attribute age can be young, middle, and old. Attribute values can be also numerical. In data analysis the basic problem we are interested in is to find patterns in data, i.e., to find a relationship between some sets of attributes, e.g., we might be interested whether blood pressure depends on age and sex.

7 80 Hung Son Nguyen and Andrzej Skowron Suppose we are given a pair A = (U,A) of non-empty, finite sets U and A, where U is the universe of objects, and A a set consisting of attributes, i.e. functions a : U V a, where V a is the set of values of attribute a, called the domain of a. The pair A = (U,A) is called an information system (see, e.g., [210]). Any information system can be represented by a data table with rows labeled by objects and columns labeled by attributes 3. Any pair (x,a), where x U and a A defines the table entry consisting of the value a(x). Any subset B of A determines a binary relation IND B on U, called an indiscernibility relation, defined by (3.1) x IND B y if and only if a(x) = a(y) for every a B, where a(x) denotes the value of attribute a for object x. Obviously, IND B is an equivalence relation. The family of all equivalence classes of IND B, i.e., the partition determined by B, will be denoted by U/IND B, or simply U/B; an equivalence class of IND B, i.e., the block of the partition U/B, containing x will be denoted by B(x) (other notation used: [x] B or more precisely [x] INDB ). Thus in view of the data we are unable, in general, to observe individual objects but we are forced to reason only about the accessible granules of knowledge (see, e.g., [199, 215, 255]). If (x,y) IND B we will say that x and y are B-indiscernible. Equivalence classes of the relation IND B (or blocks of the partition U/B) are referred to as B-elementary sets or B-elementary granules. In the rough set approach the elementary sets are the basic building blocks (concepts) of our knowledge about reality. The unions of B- elementary sets are called B-definable sets. 4 For B A we denote by In f B (x) the B-signature of x U, i.e., the set {(a,a(s)) : a B}. Let INF(B) = {In f B (s) : s U}. Then for any objects x,y U the following equivalence holds: xind B y if and only if In f B (x) = In f B (y). The indiscernibility relation will be further used to define basic concepts of rough set theory. Let us define now the following two operations on sets X U (3.2) (3.3) LOW B (X) = {x U : B(x) X}, UPP B (X) = {x U : B(x) X }, assigning to every subset X of the universe U two sets LOW B (X) and UPP B (X) called the B-lower and the B-upper approximation of X, respectively. The set (3.4) BN B (X) = UPP B (X) LOW B (X), will be referred to as the B-boundary region of X. From the definition we obtain the following interpretation: 3 Note, that in statistics or machine learning such a data table is called a sample [97]. 4 One can compare data tables corresponding to information systems with relations in relational databases [63].

8 3 Rough Sets: From Rudiments to Challenges 81 The lower approximation of a set X with respect to B is the set of all objects, which can be for certain classified as objects in X using B (are certainly in X in view of B). The upper approximation of a set X with respect to B is the set of all objects which can be possibly classified as objects in X using B (are possibly in X in view of B). The boundary region of a set X with respect to B is the set of all objects, which can be classified neither as in X nor as in U X using B. In other words, due to the granularity of knowledge, rough sets cannot be characterized by using available knowledge. Therefore with every rough set we associate two crisp sets, called lower and upper approximation. Intuitively, the lower approximation of a set consists of all elements that surely belong to the set, whereas the upper approximation of the set constitutes of all elements that possibly belong to the set, and the boundary region of the set consists of all elements that cannot be classified uniquely to the set or its complement, by employing available knowledge. The approximation definition is clearly depicted in Figure 3.1. The universe of objects Fig. 3.1: A rough set.

9 82 Hung Son Nguyen and Andrzej Skowron The approximations have the following properties: (3.5) LOW B (X) X UPP B (X), LOW B ( ) = UPP B ( ) =,LOW B (U) = UPP B (U) = U, UPP B (X Y ) = UPP B (X) UPP B (Y ), LOW B (X Y ) = LOW B (X) LOW B (Y ), X Y implies LOW B (X) LOW B (Y ) and UPP B (X) UPP B (Y ), LOW B (X Y ) LOW B (X) LOW B (Y ), UPP B (X Y ) UPP B (X) UPP B (Y ), LOW B ( X) = UPP B (X), UPP B ( X) = LOW B (X), LOW B (LOW B (X)) = UPP B (LOW B (X)) = LOW B (X), UPP B (UPP B (X)) = LOW B (UPP B (X)) = UPP B (X). Let us note that the inclusions in (3.5) cannot be in general substituted by the equalities. This has some important algorithmic and logical consequences. Now we are ready to give the definition of rough sets. If the boundary region of X is the empty set, i.e., BN B (X) =, then the set X is crisp (exact) with respect to B; in the opposite case, i.e., if BN B (X), the set X is referred to as rough (inexact) with respect to B. Thus any rough set, in contrast to a crisp set, has a non-empty boundary region. One can define the following four basic classes of rough sets, i.e., four categories of vagueness: (3.6) X is roughly B-definable iff LOW B (X) and UPP B (X) U, X is internally B-indefinable iff LOW B (X) = and UPP B (X) U, X is externally B-indefinable iff LOW B (X) and UPP B (X) = U, X is totally B-indefinable iff LOW B (X) = and UPP B (X) = U. The intuitive meaning of this classification is the following. If X is roughly B-definable, this means that we are able to decide for some elements of U that they belong to X and for some elements of U we are able to decide that they belong to X, using B. If X is internally B-indefinable, this means that we are able to decide about some elements of U that they belong to X, but we are unable to decide for any element of U that it belongs to X, using B. If X is externally B-indefinable, this means that we are able to decide for some elements of U that they belong to X, but we are unable to decide, for any element of U that it belongs to X, using B. If X is totally B-indefinable, we are unable to decide for any element of U whether it belongs to X or X, using B.

10 3 Rough Sets: From Rudiments to Challenges 83 Thus a set is rough (imprecise) if it has nonempty boundary region; otherwise the set is crisp (precise). This is exactly the idea of vagueness proposed by Frege. Let us observe that the definition of rough sets refers to data (knowledge), and is subjective, in contrast to the definition of classical sets, which is in some sense an objective one. A rough set can also be characterized numerically by the following coefficient (3.7) α B (X) = LOW B(X) UPP B (X), called the accuracy of approximation, where X is a nonempty set and S denotes the cardinality of set S. 5 Obviously 0 α B (X) 1. If α B (X) = 1 then X is crisp with respect to B (X is precise with respect to B), and otherwise, if α B (X) < 1 then X is rough with respect to B (X is vague with respect to B). The accuracy of approximation can be used to measure the quality of approximation of decision classes on the universe U. One can use another measure of accuracy defined by 1 α B (X) or by 1 BN B(X). Some other measures of approximation accuracy are also used, e.g., U based on entropy or some more specific properties of boundary regions (see, e.g., [64, 288, 317]). The choice of a relevant accuracy of approximation depends on a particular data set. Observe that the accuracy of approximation of X can be tuned by B. Another approach to accuracy of approximation can be based on the Variable Precision Rough Set Model (VPRSM) [389]. In the next section, we discuss decision rules (constructed over a selected set B of features or a family of sets of features) which are used in inducing classification algorithms (classifiers) making it possible to classify to decision classes unseen objects. Parameters which are tuned in searching for a classifier with the high quality are its description size (defined using decision rules) and its quality of classification (measured by the number of misclassified objects on a given set of objects). By selecting a proper balance between the accuracy of classification and the description size we expect to find the classifier with the high quality of classification also on unseen objects. This approach is based on the minimum description length principle [267, 268, 318] Decision Systems and Decision Rules Sometimes we distinguish in an information system A = (U,A) a partition of A into two disjoint classes C, D A of attributes, called condition and decision (action) attributes, respectively. The tuple A = (U,C, D) is called a decision system (or decison table 6 ). 5 The cardinality of set S is also denoted by card(s) instead of S. 6 More precisely, decision tables are representations of decision systems.

11 84 Hung Son Nguyen and Andrzej Skowron Let V = {V a a C} {V d d D}. Atomic formulae over B C D and V are expressions a = v called descriptors (selectors) over B and V, where a B and v V a. The set of formulae over B and V, denoted by F (B,V ), is the least set containing all atomic formulae over B and V and closed under the propositional connectives (conjunction), (disjunction) and (negation). By ϕ A we denote the meaning of ϕ F (B,V ) in the decision system A which is the set of all objects in U with the property ϕ. These sets are defined by a = v A = {x U a(x) = v}, ϕ ϕ A = ϕ A ϕ A ; ϕ ϕ A = ϕ A ϕ A ; ϕ A = U ϕ A. The formulae from F (C,V ), F (D,V ) are called condition formulae of A and decision formulae of A, respectively. Any object x U belongs to the decision class d D d = d(x) A of A. All decision classes of A create a partition U/D of the universe U. A decision rule for A is any expression of the form ϕ ψ, where ϕ F (C,V ), ψ F (D,V ), and ϕ A. Formulae ϕ and ψ are referred to as the predecessor and the successor of decision rule ϕ ψ. Decision rules are often called IF...T HEN... rules. Such rules are used in machine learning (see, e.g., [97]). Decision rule ϕ ψ is true in A if and only if ϕ A ψ A. Otherwise, one can measure its truth degree by introducing some inclusion measure of ϕ A in ψ A. Let us denote by ϕ the number of objects from U that satisfies formula ϕ, i.e. the cardinality of ϕ A. According to Łukasiewicz [142], one can assign to formula ϕ the value ϕ ϕ ψ, and to the implication ϕ ψ the fractional value, U ϕ under the assumption that ϕ. Proposed by Łukasiewicz, that fractional part was much later adapted by machine learning and data mining literature, e.g. in the definitions of the accuracy of decision rules or confidence of association rules. Each object x of a decision system determines a decision rule (3.8) a = a(x) d = d(x). a C For any decision system A = (U,C, D) one can consider a generalized decision function A : U P(INF(D)) defined by { (3.9) A (x) = i INF(D) : x U [ (x,x) IND C and In f D (x ) = i ]}, where A = C D, P(INF(D)) is the powerset of the set INF(D) of all possible decision signatures. The decision system A is called consistent (deterministic), if A (x) = 1, for any x U. Otherwise A is said to be inconsistent (non-deterministic). Hence, decision system is inconsistent if it consists of some objects with different decisions but indiscernible with respect to condition attributes. Any set consisting of all objects with the same generalized decision value is called a generalized decision class. Now, one can consider certain (possible) rules (see, e.g. [84, 89]) for decision classes defined by the lower (upper) approximations of such generalized decision classes of A. This approach can be extend, using the relationships of rough sets with d D

12 3 Rough Sets: From Rudiments to Challenges 85 the Dempster-Shafer theory (see, e.g., [278, 288]), by considering rules relative to decision classes defined by the lower approximations of unions of decision classes of A. Numerous methods have been developed for generation of different types of decision rules, and the reader can find by himself in the literature on rough sets. Usually, one is searching for decision rules (semi) optimal with respect to some optimization criteria describing quality of decision rules in concept approximations. In the case of searching for concept approximation in an extension of a given universe of objects (sample), the following steps are typical. When a set of rules has been induced from a decision system containing a set of training examples, they can be inspected to see if they reveal any novel relationships between attributes that are worth pursuing for further research. Furthermore, the rules can be applied to a set of unseen cases in order to estimate their classification power. For a systematic overview of rule application methods the reader is referred to the literature (see, e.g., [16, 151]) Dependency of Attributes Another important issue in data analysis is discovering dependencies between attributes in a given decision system A = (U,C, D). Intuitively, a set of attributes D depends totally on a set of attributes C, denoted C D, if the values of attributes from C uniquely determine the values of attributes from D. In other words, D depends totally on C, if there exists a functional dependency between values of C and D. Hence, C D if and only if the rule (3.8) is true on A for any x U. In general, D can depend partially on C. Formally such a partial dependency can be defined in the following way. We will say that D depends on C to a degree k (0 k 1), denoted C k D, if (3.10) k = γ(c,d) = POS C(D), U where (3.11) POS C (D) = X U/D LOW C (X), called a positive region of the partition U/D with respect to C, is the set of all elements of U that can be uniquely classified to blocks of the partition U/D, by means of C. If k = 1 we say that D depends totally on C, and if k < 1, we say that D depends partially (to degree k) on C. If k = 0 then the positive region of the partition U/D with respect to C is empty.

13 86 Hung Son Nguyen and Andrzej Skowron The coefficient k expresses the ratio of all elements of the universe, which can be properly classified to blocks of the partition U/D, employing attributes C and will be called the degree of the dependency. It can be easily seen that if D depends totally on C then IND C IND D. It means that the partition generated by C is finer than the partition generated by D. Notice, that the concept of dependency discussed above corresponds to that considered in relational databases. Summing up: D is totally (partially) dependent on C, if all (some) elements of the universe U can be uniquely classified to blocks of the partition U/D, employing C. Observe, that (3.10) defines only one of possible measures of dependency between attributes (see, e.g., [316]). One also can compare the dependency discussed in this section with dependencies considered in databases [63] Reduction of Attributes We often face a question whether we can remove some data from a data-table preserving its basic properties, that is whether a table contains some superfluous data. Let us express this idea more precisely. Let C, D A, be sets of condition and decision attributes respectively. We will say that C C is a D-reduct (reduct with respect to D) of C, if C is a minimal subset of C such that (3.12) γ(c,d) = γ(c,d). The intersection of all D-reducts is called a D-core (core with respect to D). Because the core is the intersection of all reducts, it is included in every reduct, i.e., each element of the core belongs to some reduct. Thus, in a sense, the core is the most important subset of attributes, since none of its elements can be removed without affecting the classification power of attributes. Certainly, the geometry of reducts can be more compound. For example, the core can be empty but there can exist a partition of reducts into a few sets with non empty intersection. Many other kinds of reducts and their approximations are discussed in the literature (see, e.g., [20, 172, 175, 279, 314, 317, 318, 122, 169, 333]). For example, if one change the condition (3.12) to A (x) = B (x), (where A = C D and B = C D) then the defined reducts are preserving the generalized decision. Other kinds of reducts are preserving, e.g.,: (i) the distance between attribute value vectors for any two objects, if this distance is greater than a given threshold [279], (ii) the distance between entropy distributions between any two objects, if this distance exceeds a given threshold [314, 317], or (iii) the so called reducts relative to object used for generation of decision rules [20]. There are some relationships between different kinds of reducts. If B is a reduct preserving the generalized decision, than in B is included a reduct preserving the positive region. For mentioned above reducts based

14 3 Rough Sets: From Rudiments to Challenges 87 on distances and thresholds one can find analogous dependency between reducts relative to different thresholds. By choosing different kinds of reducts we select different degrees to which information encoded in data is preserved. Reducts are used for building data models. Choosing a particular reduct or a set of reducts has impact on the model size as well as on its quality in describing a given data set. The model size together with the model quality are two basic components tuned in selecting relevant data models. This is known as the minimum length principle (see, e.g., [267, 268, 317, 318]). Selection of relevant kinds of reducts is an important step in building data models. It turns out that the different kinds of reducts can be efficiently computed using heuristics based, e.g., on the Boolean reasoning approach [31, 32, 30, 36] Discernibility and Boolean Reasoning Methodologies devoted to data mining, knowledge discovery, decision support, pattern classification, approximate reasoning require tools for discovering templates (patterns) in data and classifying them into certain decision classes. Templates are in many cases most frequent sequences of events, most probable events, regular configurations of objects, the decision rules of high quality, standard reasoning schemes. Tools for discovering and classifying of templates are based on reasoning schemes rooted in various paradigms [51]. Such patterns can be extracted from data by means of methods based, e.g., on Boolean reasoning and discernibility. The discernibility relations are closely related to indiscernibility and belong to the most important relations considered in rough set theory. The ability to discern between perceived objects is important for constructing many entities like reducts, decision rules or decision algorithms. In the standard approach the discernibility relation DIS B U U is defined by x DIS B y if and only if non(x IND B y), i.e., B(x) B(y) =. However, this is, in general, not the case for generalized approximation spaces. For example, in the case of some of such spaces, for any object x may be given a family F(x) with more then one elementary granules (neighborhoods) such that x I(x) for any I(x) F(x). Then, one can define that objects x,y are discernible if and only if I(x) I(y) = for some I(x) F(x) and I(y) F(y) and indiscernibility my be not the negation of this condition, e.g., objects x,y are defined as indiscernible if and only if I(x) I(y) for some I(x) F(x) and I(y) F(y). The idea of Boolean reasoning is based on construction for a given problem P of a corresponding Boolean function f P with the following property: the solutions for the problem P can be decoded from prime implicants of the Boolean function f P. Let us mention that to solve real-life problems it is necessary to deal with Boolean functions having large number of variables. A successful methodology based on the discernibility of objects and Boolean reasoning has been developed for computing of many important ingredients for applications. These applications include generation of reducts and their approxima-

15 88 Hung Son Nguyen and Andrzej Skowron tions, decision rules, association rules, discretization of real value attributes, symbolic value grouping, searching for new features defined by oblique hyperplanes or higher order surfaces, pattern extraction from data as well as conflict resolution or negotiation (see, e.g., [20, 172, 175, 279, 314, 317, 318, 169]). Most of the problems related to generation of the above mentioned entities are NP-complete or NP-hard. However, it was possible to develop efficient heuristics returning suboptimal solutions of the problems. The results of experiments on many data sets are very promising. They show very good quality of solutions generated by the heuristics in comparison with other methods reported in literature (e.g., with respect to the classification quality of unseen objects). Moreover, they are very efficient from the point of view of time necessary for computing of the solution. Many of these methods are based on discernibility matrices. Note, that it is possible to compute the necessary information about these matrices using directly 7 information or decision systems (e.g., sorted in preprocessing [16, 168, 178, 373]) which significantly improves the efficiency of algorithms. It is important to note that the methodology makes it possible to construct heuristics having a very important approximation property which can be formulated as follows: expressions, called approximate implicants, generated by heuristics that are close to prime implicants define approximate solutions for the problem Rough Membership Let us observe that rough sets can be also defined employing the rough membership function (see Eq. 3.13) instead of approximation [219]. That is, consider defined by (3.13) µ B X(x) = µ B X : U [0,1], B(x) X, X where x X U. The value µ B X (x) can be interpreted as the degree that x belongs to X in view of knowledge about x expressed by B or the degree to which the elementary granule B(x) is included in the set X. This means that the definition reflects a subjective knowledge about elements of the universe, in contrast to the classical definition of a set. The rough membership function can also be interpreted as the conditional probability that x belongs to X given B. This interpretation was used by several researchers in the rough set community (see, e.g., [87, 317, 357, 372, 393, 382], [389]). Note also that the ratio on the right hand side of the equation (3.13) is known as the confidence coefficient in data mining [97, 115]. It is worthwhile to mention that 7 i.e., without the necessity of generation and storing of the discernibility matrices

16 3 Rough Sets: From Rudiments to Challenges 89 set inclusion to a degree has been considered by Łukasiewicz [142] in studies on assigning fractional truth values to logical formulas. On can observe that the rough membership function has the following properties [219]: 1) µ B X (x) = 1 iff x LOW B(X), 2) µ B X (x) = 0 iff x U UPP B(X), 3) 0 < µ B X (x) < 1 iff x BN B(X), 4) µ U X B (x) = 1 µb X (x) for any x U, 5) µ B X Y (x) max( µ B X (x), µb Y (x)) for any x U, 6) µ B X Y (x) min( µ B X (x), µb Y (x)) for any x U. From the properties it follows that the rough membership differs essentially from the fuzzy membership [385], for properties 5) and 6) show that the membership for union and intersection of sets, in general, cannot be computed as in the case of fuzzy sets from their constituents membership. Thus formally the rough membership is different from fuzzy membership. Moreover, the rough membership function depends on an available knowledge (represented by attributes from B). Besides, the rough membership function, in contrast to fuzzy membership function, has a probabilistic flavor. Let us also mention that rough set theory, in contrast to fuzzy set theory, clearly distinguishes two very important concepts, vagueness and uncertainty, very often confused in the AI literature. Vagueness is the property of concepts. Vague concepts can be approximated using the rough set approach [287]. Uncertainty is the property of elements of a set or a set itself (e.g., only examples and/or counterexamples of elements of a considered set are given). Uncertainty of elements of a set can be expressed by the rough membership function. Both fuzzy and rough set theory represent two different approaches to vagueness. Fuzzy set theory addresses gradualness of knowledge, expressed by the fuzzy membership, whereas rough set theory addresses granularity of knowledge, expressed by the indiscernibility relation. A nice illustration of this difference has been given by Dider Dubois and Henri Prade [49] in the following example. In image processing fuzzy set theory refers to gradualness of gray level, whereas rough set theory is about the size of pixels. Consequently, both theories are not competing but are rather complementary. In particular, the rough set approach provides tools for approximate construction of fuzzy membership functions. The rough-fuzzy hybridization approach proved to be successful in many applications (see, e.g., [196, 200]). Interesting discussion of fuzzy and rough set theory in the approach to vagueness can be found in [266]. Let us also observe that fuzzy set and rough set theory are not a remedy for classical set theory difficulties. One of the consequences of perceiving objects by information about them is that for some objects one cannot decide if they belong to a given set or not. However, one can estimate the degree to which objects belong to sets. This is a crucial observation in building foundations for approximate reasoning. Dealing with imperfect knowledge implies that one can only characterize satisfiability of relations between

17 90 Hung Son Nguyen and Andrzej Skowron objects to a degree, not precisely. One of the fundamental relations on objects is a rough inclusion relation describing that objects are parts of other objects to a degree. The rough mereological approach [248, 199, 249, 250, 252] based on such a relation is an extension of the Leśniewski mereology [128]. 3.4 Generalizations of Approximation Spaces The rough set concept can be defined quite generally by means of topological operations, interior and closure, called approximations [245]. It was observed in [212] that the key to the presented approach is provided by the exact mathematical formulation of the concept of approximative (rough) equality of sets in a given approximation space. In [215], an approximation space is represented by the pair (U,R ), where U is a universe of objects, and R U U is an indiscernibility relation defined by an attribute set (i.e., R = IND A for some attribute set A). In this case R is the equivalence relation. Let [x] R denote an equivalence class of an element x U under the indiscernibility relation R, where [x] R = {y U : xr y}. In this context, R -approximations of any set X U are based on the exact (crisp) containment of sets. Then set approximations are defined as follows: x U belongs with certainty to X U (i.e., x belongs to the R -lower approximation of X), if [x] R X. x U possibly belongs X U (i.e., x belongs to the R -upper approximation of X), if [x] R X. x U belongs with certainty neither to the X nor to U X (i.e., x belongs to the R -boundary region of X), if [x] R (U X) and [x] R X. Our knowledge about the approximated concepts is often partial and uncertain [83]. For example, concept approximation should be constructed from examples and counterexamples of objects for the concepts [97]. Hence, concept approximations constructed from a given sample of objects are extended, using inductive reasoning, on objects not yet observed. Several generalizations of the classical rough set approach based on approximation spaces defined as pairs of the form (U,R ), where R is the equivalence relation (called indiscernibility relation) on the set U, have been reported in the literature (see, e.g., [132, 134, 251, 284, 356, 379, 380, 381, 383, 308, 302, 331, 301, 303]) 8. Let us mention two of them. The concept approximations should be constructed under dynamically changing environments [287, 307]. This leads to a more complex situation where the boundary regions are not crisp sets, which is consistent with the postulate of the higher order 8 Among extensions not discussed in this chapter is the rough set approach to multicriteria decision making (see, e.g., [75, 76, 77, 78, 82, 231, 243, 325, 79, 120, 80, 81, 74, 323] and also the Chapter jmaf - Dominance-based Rough Set Data Analysis Framework by J. Błaszczyński, S. Greco, B. Matarazzo, R. Słowiński, M. Szela g in this book (chapter 5)).

18 3 Rough Sets: From Rudiments to Challenges 91 vagueness considered by philosophers (see, e.g., [111]). Different aspects of vagueness in the rough set framework are discussed, e.g., in [145, 184, 186, 266, 287]. It is worthwhile to mention that a rough set approach to the approximation of compound concepts has been developed. For such concepts, it is hardly possible to expect that they can be approximated with the high quality by the traditional methods [35, 364]. The approach is based on hierarchical learning and ontology approximation [13, 24, 177, 199, 294]. Approximation of concepts in distributed environments is discussed in [285]. A survey of algorithmic methods for concept approximation based on rough sets and Boolean reasoning in presented, e.g., in [16, 281, 169, 13]. A generalized approximation space 9 can be defined by a tuple AS = (U,I,ν) where I is the uncertainty function defined on U with values in the powerset P(U) of U (I(x) is the neighboorhood of x) and ν is the inclusion function defined on the Cartesian product P(U) P(U) with values in the interval [0,1] measuring the degree of inclusion of sets [297]. The lower and upper approximation operations can be defined in AS by (3.14) (3.15) LOW AS (X) = {x U : ν(i(x),x) = 1}, UPP AS (X) = {x U : ν(i(x),x) > 0}. In the standard case, I(x) is equal to the equivalence class B(x) of the indiscernibility relation IND B ; in case of tolerance (similarity) relation T U U [256] we take I(x) = [x] T = {y U : x T y}, i.e., I(x) is equal to the tolerance class of T defined by x. The standard rough inclusion relation ν SRI is defined for X,Y U by X Y, if X is non empty, (3.16) ν SRI (X,Y ) = X 1, otherwise. For applications it is important to have some constructive definitions of I and ν. One can consider another way to define I(x). Usually together with AS we consider some set F of formulae describing sets of objects in the universe U of AS defined by semantics AS, i.e., α AS U for any α F 10. Now, one can take the set (3.17) N F (x) = {α F : x α AS }, and I (x) = { α AS : α N F (x)}. Hence, more general uncertainty functions having values in P(P(U)) can be defined and in the consequence different definitions of approximations are considered. For example, one can consider the following definitions of approximation operations in this approximation space AS = (U,I,ν): (3.18) (3.19) LOW AS (X) = {x U : ν(y,x) = 1 for some Y I(x)}, UPP AS (X) = {x U : ν(y,x) > 0 for any Y I(x)}. 9 More general cases are considered, e.g., in [301, 303]. 10 If AS = (U,I,ν) then we will also write α U instead of α AS.

19 92 Hung Son Nguyen and Andrzej Skowron There are also different forms of rough inclusion functions. Let us consider two examples. In the first example of a rough inclusion function, a threshold t (0,0.5) is used to relax the degree of inclusion of sets. The rough inclusion function ν t is defined by 1 if ν SRI (X,Y ) 1 t, ν SRI (X,Y ) t 1 2t (3.20) ν t (X,Y ) = if t ν SRI (X,Y ) < 1 t, 0 if ν SRI (X,Y ) t. This is an interesting rough-fuzzy example because we put the standard rough membership function as an argument into the formula often used for fuzzy membership functions. One can obtain approximations considered in the variable precision rough set approach (VPRSM) [389] by substituting in (3.14)-(3.15) the rough inclusion function ν t defined by (3.20) instead of ν, assuming that Y is a decision class and I(x) = B(x) for any object x, where B is a given set of attributes. Another example of application of the standard inclusion was developed by using probabilistic decision functions. For more detail the reader is referred to [303, 349, 316, 317]. The rough inclusion relation can be also used for function approximation [308, 301, 303] and relation approximation [329]. In the case of function approximation the inclusion function ν for subsets X,Y U U, where U R and R is the set of real numbers, is defined by π 1 (X Y ) (3.21) ν if π (X,Y ) = 1 (X), π 1 (X) 1 if π 1 (X) =, where π 1 is the projection operation on the first coordinate. Assume now, that X is a cube and Y is the graph G( f ) of the function f : R R. Then, e.g., X is in the lower approximation of f if the projection on the first coordinate of the intersection X G( f ) is equal to the projection of X on the first coordinate. This means that the part of the graph G( f ) is well included in the box X, i.e., for all arguments that belong to the box projection on the first coordinate the value of f is included in the box X projection on the second coordinate. This approach was extended in several papers (see, e.g., [349, 303]). The approach based on inclusion functions has been generalized to the rough mereological approach [199, 250, 249, 252]. The inclusion relation xµ r y with the intended meaning x is a part of y to a degree at least r has been taken as the basic notion of the rough mereology being a generalization of the Leśniewski mereology [128, 129]. Research on rough mereology has shown importance of another notion, namely closeness of compound objects (e.g., concepts). This can be defined by x cl r,r y if and only if x µ r y and y µ r x.

20 3 Rough Sets: From Rudiments to Challenges 93 Rough mereology offers a methodology for synthesis and analysis of objects in a distributed environment of intelligent agents, in particular, for synthesis of objects satisfying a given specification to a satisfactory degree or for control in such a complex environment. Moreover, rough mereology has been used for developing the foundations of the information granule calculi, aiming at formalization of the Computing with Words paradigm, formulated by Lotfi Zadeh [386]. More complex information granules are defined recursively using already defined information granules and their measures of inclusion and closeness. Information granules can have complex structures like classifiers or approximation spaces. Computations on information granules are performed to discover relevant information granules, e.g., patterns or approximation spaces for compound concept approximations. Usually there are considered families of approximation spaces labeled by some parameters. By tuning such parameters according to chosen criteria (e.g., minimal description length) one can search for the optimal approximation space for concept description (see, e.g., [16, 169, 13]). 3.5 Rough Sets and Induction Granular formulas are constructed from atomic formulas corresponding to the considered attributes (see, e.g., [222, 221, 301, 303]). In the consequence, the satisfiability of such formulas is defined if the satisfiability of atomic formulas is given as the result of sensor measurement. Let us consider two information systems A = (U,C,D) and its extension A = (U,C) having the same set of attributes C (more precisely, the set o attributes in A is obtained by restricting to U attributes from A defined on U U). Hence, one can consider for any constructed formula α over atomic formulas its semantics α A U over U as well as the semantics α A U over U (see Figure 3.2). The difference between these two cases is the following. In the case of U, one can compute α A U but in the case α A U, for any object from U U, there is no information about its membership relative to α A α A. One can estimate the satisfiability of α for objects u U U only after some relevant sensory measurements on u are performed. In particular, one can use some methods for estimation of relationships among semantics of formulas over U using the relationships among semantics of these formulas over U. For example, one can apply statistical methods. This step is crucial in investigation of extensions of approximation spaces relevant for inducing classifiers from data.

21 94 Hung Son Nguyen and Andrzej Skowron U * α U * U α U Fig. 3.2: Two semantics of α over U and U, respectively Rough Sets and Classifiers In this section, we consider the problem of approximation of concepts over a universe U (concepts that are subsets of U ). We assume that the concepts are perceived only through some subsets of U, called samples. This is a typical situation in the machine learning, pattern recognition, or data mining approaches [97, 115, 135]. We explain the rough set approach to induction of concept approximations using the generalized approximation spaces of the form AS = (U,I,ν) defined in Section 3.4. Let U U be a finite sample. By Π U we denote a perception function from P(U ) into P(U) defined by Π U (C) = C U for any concept C U. Let us consider first an illustrative example. We assume that there is given an information system A = (U,A) and let us assume that for some C U there is given the set Π U (C) = C U. In this way we obtain a decision system AS d = (U,A,d), where d(x) = 1 if x C U and d(x) = 0, otherwise. We would like to illustrate how from the decision function d may be induced a decision function d defined over U which can be treated as an approximation of the characteristic function of C. Let us assume that RULES(AS d ) is a set of decision rules induced by some rule generation method from AS d. For any object x U, let MatchRules(AS d,x) be the set of rules from RULES(AS d ) supported by x (see, e.g., [20]). Let C 1 = C and C 0 = U \C. Now, for k = 1,0 one can define the rough membership functions µ k : U [0,1] in the following way:

22 3 Rough Sets: From Rudiments to Challenges Let R k (x) be the set of all decision rules from MatchRules(AS d,x) for C k, i.e., decision rules from MatchRules(AS d,x) with right hand side d = k 2. We define real values w k (x), where w 1 (x) is called the weight for and w 0 (x) the weight against membership of the object x in C, respectively, by w k (x) = r R k (x) strength(r), where strength(r) is a normalized function depending on length, support, con f idence of the decision rule r and on some global information about the decision table AS d such as the table size or the class distribution (see [20]). 3. Finally, one can define the value of µ k (x) by undefined if max(w k (x),w 1 k (x)) < ω 0 if w 1 k (x) w k (x) θ and w 1 k (x) > ω µ k (x) = 1 if w k (x) w 1 k (x) θ and w k (x) > ω in other cases, θ+(w k (x) w 1 k (x)) 2θ where ω,θ are parameters set by user. Now, for computing of the value d (x) for x U the user should select a strategy resolving conflicts between values µ 1 (x) and µ 0 (x) representing, in a sense votes for and against membership of x in C, respectively. Note that for some cases x due to the small differences between these values the selected strategy may not produce the definite answer and these cases will create the boundary region. Let us consider a generalized approximation space AS = (U,I,ν SRI ), where I(x) = A(x) for x U. Now we would like to check how this approximation space may be inductively extended so that in the induced approximation space we may define approximation of the concept C or in other words the approximation of the decision function d. Hence, the problem we are considering is how to extend the approximations of Π U (C) defined by AS to an approximation of C over U. We show that the problem can be described as searching for an extension AS C = (U,I C,ν C ) of the approximation space AS, relevant for approximation of C. This requires to show how to extend the inclusion function ν from subsets of U to subsets of U that are relevant for the approximation of C. Observe that for the approximation of C it is enough to induce the necessary values of the inclusion function ν C without knowing the exact value of I C (x) U for x U. Let AS be a given approximation space for Π U (C) and let us consider a language L in which the neighborhood I(x) U is expressible by a formula pat(x), for any x U. It means that I(x) = pat(x) U U, where pat(x) U denotes the meaning of pat(x) restricted to the sample U. In case of rule based classifiers patterns of the form pat(x) are defined by feature value vectors. We assume that for any new object x U U we can obtain (e.g., as a result of sensor measurement) a pattern pat(x) L with semantics pat(x) U U. However, the relationships between information granules over U like sets: pat(x) U

23 96 Hung Son Nguyen and Andrzej Skowron and pat(y) U, for different x,y U, are, in general, known only if they can be expressed by relationships between the restrictions of these sets to the sample U, i.e., between sets Π U ( pat(x) U ) and Π U ( pat(y) U ). The set of patterns {pat(x) : x U} is usually not relevant for approximation of the concept C U. Such patterns are too specific or not enough general, and can directly be applied only to a very limited number of new objects. However, by using some generalization strategies, one can search, in a family of patterns definable from {pat(x) : x U} in L, for such new patterns that are relevant for approximation of concepts over U. Let us consider a subset PAT T ERNS(AS,L,C) L chosen as a set of pattern candidates for relevant approximation of a given concept C. For example, in case of rule based classifier one can search for such candidate patterns among sets definable by subsequences of feature value vectors corresponding to objects from the sample U. The set PAT T ERNS(AS,L,C) can be selected by using some quality measures checked on meanings (semantics) of its elements restricted to the sample U (like the number of examples from the concept Π U (C) and its complement that support a given pattern). Then, on the basis of properties of sets definable by these patterns over U we induce approximate values of the inclusion function ν C on subsets of U definable by any of such pattern and the concept C. Next, we induce the value of ν C on pairs (X,Y ) where X U is definable by a pattern from {pat(x) : x U } and Y U is definable by a pattern from PAT T ERNS(AS, L,C). Finally, for any object x U U we induce the approximation of the degree ν C ( pat(x) U,C) applying a conflict resolution strategy Con f lict res (a voting strategy, in case of rule based classifiers) to two families of degrees: (3.22) {ν C ( pat(x) U, pat U ) : pat PAT T ERNS(AS,L,C)}, (3.23) {ν C ( pat U,C) : pat PAT T ERNS(AS,L,C)}. Values of the inclusion function for the remaining subsets of U can be chosen in any way they do not have any impact on the approximations of C. Moreover, observe that for the approximation of C we do not need to know the exact values of uncertainty function I C it is enough to induce the values of the inclusion function ν C. Observe that the defined extension ν C of ν to some subsets of U makes it possible to define an approximation of the concept C in a new approximation space AS C. Observe that one can also follow principles of Bayesian reasoning and use degrees of ν C to approximate C (see, e.g., [217, 319, 322]). Let us present yet another example of (inductive) extension AS of approximation space AS in the case of rule based classifiers. For details the reader is referred to, e.g., [301, 303]. Let h : [0, 1] {0, 1/2, 1} be a function defined by

24 3 Rough Sets: From Rudiments to Challenges 97 1, if t > 1 2, 1 (3.24) h(t) = 2, if t = 1 2, 0, if t < 1 2. We start with an extension of the uncertainty function and the rough inclusion function from U to U, where U U : (3.25) I(x) = { lh(r) U : x lh(r) U and r Rule set}, where x U and lh(r) denotes the formula on the left hand side of the rule r, and Rule set is a set of decision rules induced from a given decision system DT = (U,A,d). In this approach, the rough inclusion function is defined by ( ) {Y X : Y U Z} (3.26) ν U (X,Z) = h {Y X : Y U Z} + {Y X : Y U U, Z} where X P(U ), X and Z U. In case X = we set ν U (,Z) = 0 The induced uncertainty and rough inclusion functions can now be used to define the lower approximation LOW AS (Z), the upper approximation UPP AS (Z), and the boundary region BN AS (Z) of Z U by: (3.27) LOW AS (Z) = {x U : ν U (I(x),Z) = 1}, (3.28) UPP AS (Z) = {x U : ν U (I(x),Z) > 0}, and (3.29) BN AS (Z) = UPP AS (Z) LOW AS (Z). In the example, we classify objects from U to the lower approximation of Z if majority of rules matching this object are voting for Z and to the upper approximation of Z if at least half of the rules matching x are voting for Z. Certainly, one can follow many other voting schemes developed in machine learning or by introducing less crisp conditions in the boundary region definition. The defined approximations can be treated as estimations of the exact approximations of subsets of U because they are induced on the basis of samples of such sets restricted to U only. One can use some standard quality measures developed in machine learning to calculate the quality of such approximations assuming that after estimation of approximations on U full information about membership for objects relative to the approximated subsets of U is uncovered analogously to the testing sets in machine learning. In an analogous way, one can describe other class of classifiers used in machine learning and data mining such as neural networks or k-nn classifiers. In this way, the rough set approach to induction of concept approximations can be explained as a process of inducing a relevant approximation space.

25 98 Hung Son Nguyen and Andrzej Skowron In [303] is presented the rough set approach to approximation of partially defined concepts (see also, e.g., [22, 23, 26, 302, 177, 169, 13, 331, 311, 301, 312]). The problems discussed in this chapter are crucial for building computer systems that assist researchers in scientific discoveries in many areas. Our considerations can be treated as a step toward foundations for modeling of granular computations inside of system that is based on granules called approximation spaces. Approximation spaces are fundamental granules used in searching for relevant complex granules called as data models, e.g., approximations of complex concepts, functions or relations. The approach requires some generalizations of the approximation space concept introduced in [296, 297]. There are presented examples of rough set-based strategies for the extension of approximation spaces from samples of objects onto a whole universe of objects. This makes it possible to present foundations for inducing data models such as approximations of concepts or classifications analogous to the approaches for inducing different types of classifiers known in machine learning and data mining. Searching for relevant approximation spaces and data models are formulated as complex optimization problems. This optimization is performed relative to some measures which are some versions of the minimum length principle (MLP) [267, 268] Inducing Relevant Approximation Spaces A key task in granular computing is the information granulation process that leads to the formation of information aggregates (with inherent patterns) from a set of available objects. A methodological and algorithmic issue is the formation of transparent (understandable) information granules inasmuch as they should provide a clear and understandable description of patterns present in sample objects [9, 223]. Such a fundamental property can be formalized by a set of constraints that must be satisfied during the information granulation process. For example, in case of inducing granules such as classifiers, the constraints specify requirements for the quality of classifiers. Then, inducing of classifiers can be understood as searching for relevant approximation spaces (which can be treated as a spacial type of granules) relative to some properly selected optimization measures 11. The selection of these optimization measures is not an easy task because they should guarantee that the (semi-) optimal approximation spaces selected relative to these criteria should allow us to construct classifiers of the high quality. Let us consider some examples of optimization measures [170]. For example, the quality of an approximation space can be measured by: (3.30) Quality 1 : SAS(U) P(U) [0,1], 11 Note that while there is a large literature on the covering based rough set approach (see, e.g., [388, 91]) still much more work should be done on (scalable) algorithmic searching methods for relevant approximation spaces in huge families of approximation spaces defined by many parameters determining neighborhoods, inclusion measures and approximation operators.

26 3 Rough Sets: From Rudiments to Challenges 99 where U is a non-empty set of objects and SAS(U) is a set of possible approximation spaces with the universe U. Example 3.1. If UPP AS (X) for AS SAS(U) and X U then (3.31) Quality 1 (AS,X) = ν SRI (UPP AS (X),LOW AS (X)) = LOW AS(X) UPP AS (X). The value 1 Quality 1 (AS,X) expresses the degree of completeness of our knowledge about X, given the approximation space AS. Example 3.2. In applications, we usually use another quality measure analogous to the minimum length principle [268, 268] where also the description length of approximation is included. Let us denote by description(as, X) the description length of approximation of X in AS. The description length may be measured, e.g., by the sum of description lengths of algorithms testing membership for neighborhoods used in construction of the lower approximation, the upper approximation, and the boundary region of the set X. Then the quality Quality 2 (AS,X) can be defined by (3.32) Quality 2 (AS,X) = g(quality 1 (AS,X),description(AS,X)), where g is a relevant function used for fusion of values Quality 1 (AS,X) and description(as, X). This function g can reflect weights given by experts relative to both criteria. One can consider different optimization problems relative to a given class Set AS of approximation spaces. For example, for a given X U and a threshold t [0, 1], one can search for an approximation space AS satisfying the constraint Quality 2 (AS,X) t. Another example can be related to searching for an approximation space satisfying additionally the constraint Cost(AS) < c where Cost(AS) denotes the cost of approximation space AS (e.g., measured by the number of attributes used to define neighborhoods in AS) and c is a given threshold. In the following example, we consider also costs of searching for relevant approximation spaces in a given family defined by a parameterized approximation space. Any parameterized approximation space AS #,$ = (U,I #,ν $ ) is a family of approximation spaces. The cost of searching in such a family for a relevant approximation space for a given concept X approximation can be treated as a factor of the quality measure of approximation of X in AS #,$ = (U,I #,ν $ ). Hence, such a quality measure of approximation of X in AS #,$ can be defined by (3.33) Quality 3 (AS #,$,X) = h(quality 2 (AS,X),Cost Search(AS #,$,X)), where AS is the result of searching in AS #,$, Cost Search(AS #,$,X) is the cost of searching in AS #,$ for AS, and h is a fusion function, e.g., assuming that the values of Quality 2 (AS,X) and Cost Search(AS #,$,X) are normalized to interval [0,1] h could be defined by a linear combination of Quality 2 (AS,X) and Cost Search(AS #,$,X) of the form

27 100 Hung Son Nguyen and Andrzej Skowron (3.34) λquality 2 (AS,X) + (1 λ)cost Search(AS #,$,X), where 0 λ 1 is a weight measuring an importance of quality and cost in their fusion. We assume that the fusion functions g,h in the definitions of quality are monotonic relative to each argument. Let AS Set AS be an approximation space relevant for approximation of X U, i.e., AS is the optimal (or semi-optimal) relative to Quality 2. By Granulation(AS #,$ ), we denote a new parameterized approximation space obtained by granulation of AS #,$. For example, Granulation(AS #,$ ) can be obtained by reducing the number of attributes or inclusion degrees (i.e., possible values of the inclusion function). Let AS be an approximation space in Granulation(AS #,$ ) obtained as the result of searching for optimal (semi-optimal) approximation space in Granulation(AS #,$ ) for approximation of X. We assume that three conditions are satisfied: after granulation of AS #,$ to Granulation(AS #,$ ) the following property holds: the cost (3.35) Cost Search(Granulation(AS #,$ ),X), is much lower than the cost Cost Search(AS #,$,X); description(as,x) is much shorter than description(as,x), i.e., the description length of X in the approximation space AS is much shorter than the description length of X in the approximation space AS; Quality 1 (AS,X) and Quality 1 (AS,X) are sufficiently close. The last two conditions should guarantee that the values Quality 2 (AS,X) and Quality 2 (AS,X) are comparable and this condition together with the first condition about the cost of searching should assure that (3.36) Quality 3 (Granulation(AS #,$,X)) is much better than Quality 3 (AS #,$,X). Certainly, the phrases already mentioned such as much lower, much shorter, and sufficiently close should be further elaborated. The details will be discussed elsewhere. Taking into account that parameterized approximation spaces are examples of parameterized granules, one can generalize the above example of parameterized approximation space granulation to the case of granulation of parameterized granules. In the process of searching for (sub-)optimal approximation spaces, different strategies are used. Let us consider an example of such strategies [309]. In the example, DT = (U,A,d) denotes a decision system (a given sample of data), where

28 3 Rough Sets: From Rudiments to Challenges 101 U is a set of objects, A is a set of attributes and d is a decision. We assume that for any object x, there is accessible only partial information equal to the A-signature of x (object signature, for short), i.e., In f A (x) = {(a,a(x)) : a A} and analogously for any concept there is only given a partial information about this concept by a sample of objects, e.g., in the form of decision system. One can use object signatures as new objects in a new relational structure R. In this relational structure R are also modeled some relations between object signatures, e.g., defined by the similarities of these object signatures. Discovery of relevant relations on object signatures is an important step in the searching process for relevant approximation spaces. In this way, a class of relational structures representing perception of objects and their parts is constructed. In the next step, we select a language L of formulas expressing properties over the defined relational structures and we search for relevant formulas in L. The semantics of formulas (e.g., with one free variable) from L are subsets of object signatures. Observe that each object signature defines a neighborhood of objects from a given sample (e.g., decision system DT ) and another set on the whole universe of objects being an extension of U. In this way, each formula from L defines a family of sets of objects over the sample and also another family of sets over the universe of all objects. Such families can be used to define new neighborhoods of a new approximation space, e.g., by taking unions of the described above families. In the searching process for relevant neighborhoods, we use information encoded in the given sample. More relevant neighborhoods make it possible to define relevant approximation spaces (from the point of view of the optimization criterion). It is worth to mention that often this searching process is even more compound. For example, one can discover several relational structures (not only one, e.g., R as it was presented before) and formulas over such structures defining different families of neighborhoods from the original approximation space and next fuse them for obtaining one family of neighborhoods or one neighborhood in a new approximation space. This kind of modeling is typical for hierarchical modeling [13], e.g., when we search for a relevant approximation space for objects composed from parts for which some relevant approximation spaces have been already found Rough Sets and Higher Order Vagueness In [111], it is stressed that vague concepts should have non-crisp boundaries. In the definition presented in this chapter, the notion of boundary region is defined as a crisp set BN B (X). However, let us observe that this definition is relative to the subjective knowledge expressed by attributes from B. Different sources of information may use different sets of attributes for concept approximation. Hence, the boundary region can change when we consider these different views. Another aspect is discussed in [287, 307] where it is assumed that information about concepts is incomplete, e.g., the concepts are given only on samples (see, e.g., [97, 115, 151]). From [287, 307] it follows that vague concepts cannot be approximated with satisfactory quality by static constructs such as induced membership inclusion functions,

29 102 Hung Son Nguyen and Andrzej Skowron approximations or models derived, e.g., from a sample. Understanding of vague concepts can be only realized in a process in which the induced models are adaptively matching the concepts in a dynamically changing environment. This conclusion seems to have important consequences for further development of rough set theory in combination with fuzzy sets and other soft computing paradigms for adaptive approximate reasoning. 3.6 Information Granulation Information granulation can be viewed as a human way of achieving data compression and it plays a key role in the implementation of the strategy of divide-andconquer in human problem-solving [386, 223]. Objects obtained as the result of granulation are information granules. Examples of elementary information granules are indiscernibility or tolerance (similarity) classes (see, e.g., [222]). In reasoning about data and knowledge under uncertainty and imprecision many other more compound information granules are used (see, e.g., [255, 254, 284, 298, 299]). Examples of such granules are decision rules, sets of decision rules or classifiers. More compound information granules are defined by means of less compound ones. Note that inclusion or closeness measures between information granules should be considered rather than their strict equality. Such measures are also defined recursively for information granules. Let us discuss shortly an example of information granulation in the process of modeling patterns for compound concept approximation (see, e.g., [21, 22, 24, 25, 28, 177, 310, 303, 313, 312], [332]. We start from a generalization of information systems. For any attribute a A of an information system (U,A) we consider together with the value set V a of a a relational structure R a over the universe V a (see, e.g., [309]). We also consider a language L a of formulas (of the same relational signature as R a ). Such formulas interpreted over R a define subsets of Cartesian products of V a. For example, any formula α with one free variable defines a subset α Ra of V a. Let us observe that the relational structure R a (without functions) induces a relational structure over U. Indeed, for any k-ary relation r from R a one can define a k-ary relation g a U k by (x 1,...,x k ) g a if and only if (a(x 1 ),...,a(x k )) r for any (x 1,...,x k ) U k. Hence, one can consider any formula from L a as a constructive method of defining a subset of the universe U with a structure induced by R a. Any such a structure is a new information granule. On the next level of hierarchical modeling, i.e., in constructing new information systems we use such structures as objects and attributes are properties of such structures. Next, one can consider similarity between new constructed objects and then their similarity neighborhoods will correspond to clusters of relational structures. This process is usually more complex. This is because instead of relational structure R a we usually consider a fusion of relational structures corresponding to some attributes from A. The fusion makes it possible to describe constraints that should hold between parts obtained by composition from less compound parts. Examples of relational structures can be defined

30 3 Rough Sets: From Rudiments to Challenges 103 by indiscernibility, similarity, intervals obtained in discretization or symbolic value grouping, preference or spatio-temporal relations (see, e.g., [76, 115, 297]). One can see that parameters to be tuned in searching for relevant 12 patterns over new information systems are, among others, relational structures over value sets, the language of formulas defining parts, and constraints. 3.7 Ontological Framework for Approximation In a number of papers (see, e.g., [13, 193, 300, 14]), the problem of ontology approximation has been discussed together with possible applications to approximation of compound concepts or to knowledge transfer (see, e.g., [13, 17, 180, 276, 286, 300, 14]). For software RoughICE supporting ontology approximation the reader is referred to the system homepage 13. In the ontology [328] (vague) concepts and local dependencies between them are specified. Global dependencies can be derived from local dependencies. Such derivations can be used as hints in searching for relevant compound patterns (information granules) in approximation of more compound concepts from the ontology. The ontology approximation problem is one of the fundamental problems related to approximate reasoning in distributed environments. One should construct (in a given language that is different from the ontology specification language) not only approximations of concepts from ontology but also vague dependencies specified in the ontology. It is worthwhile to mention that an ontology approximation should be induced on the basis of incomplete information about concepts and dependencies specified in the ontology. Information granule calculi based on rough sets have been proposed as tools making it possible to solve this problem. Vague dependencies have vague concepts in premisses and conclusions. The approach to approximation of vague dependencies based only on degrees of closeness of concepts from dependencies and their approximations (classifiers) is not satisfactory for approximate reasoning. Hence, more advanced approach should be developed. Approximation of any vague dependency is a method which allows for any object to compute the arguments for and against its membership to the dependency conclusion on the basis of the analogous arguments relative to the dependency premisses. Any argument is a compound information granule (compound pattern). Arguments are fused by local schemes (production rules) discovered from data. Further fusions are possible through composition of local schemes, called approximate reasoning schemes (AR schemes) (see, e.g., [25, 199, 254]). To estimate the degree to which (at least) an object belongs to concepts from ontology the arguments for and against those concepts are collected and next a conflict resolution strategy is applied to them to predict the degree. 12 for target concept approximation 13

31 104 Hung Son Nguyen and Andrzej Skowron 3.8 Discernibility and Boolean Reasoning: Rough Set Methods for Machine Learning, Pattern Recognition, and Data Mining Tasks collected under the labels of data mining, knowledge discovery, decision support, pattern classification, and approximate reasoning require tools aimed at discovering templates (patterns) in data and classifying them into certain decision classes. Templates are in many cases most frequent sequences of events, most probable events, regular configurations of objects, the decision rules of high quality, standard reasoning schemes. Tools for discovery and classification of templates are based on reasoning schemes rooted in various paradigms [51]. Such patterns can be extracted from data by means of methods based, e.g., on Boolean reasoning and discernibility (see this Section and [36]). Discernibility relations belong to the most important relations considered in rough set theory. The ability to discern between perceived objects is important for constructing many entities like reducts, decision rules or decision algorithms. In the classical rough set approach, a discernibility relation DIS(B) U U, where B A is a subset of attributes of an information system (U,A), is defined by xdis(b)y if and only if non(xind B y), where IND B is the B-indiscernibility relation. However, this is, in general, not the case for the generalized approximation spaces. One can define indiscernibility by x I(y) and discernibility by I(x) I(y) = for any objects x,y, where I(x) = B(x),I(y) = B(y) in the case of the indiscernibility relation, and I(x), I(y) are neighborhoods of objects not necessarily defined by the equivalence relation in a more general case. The idea of Boolean reasoning is based on construction for a given problem P of a corresponding Boolean function f P with the following property: The solutions for the problem P can be decoded from prime implicants of the Boolean function f P. Let us mention that to solve real-life problems it is necessary to deal with Boolean functions having large number of variables. A successful methodology based on discernibility of objects and Boolean reasoning has been developed for computing of many entities important for applications, like reducts and their approximations, decision rules, association rules, discretization of real value attributes, symbolic value grouping, searching for new features defined by oblique hyperplanes or higher order surfaces, pattern extraction from data as well as conflict resolution or negotiation. Most of the problems related to generation of the mentioned above entities are NP-complete or NP-hard. However, it was possible to develop efficient heuristics returning suboptimal solutions of the problems. The results of experiments on many data sets are very promising. They show very good quality of solutions generated by the heuristics in comparison with other methods reported in the literature (e.g., with respect to the classification quality of unseen objects). Moreover, these heuristics are very efficient from the point of view of time necessary for computing of solutions. Many of these methods are based on discernibility matrices. Note that it

32 3 Rough Sets: From Rudiments to Challenges 105 is possible to compute the necessary information about these matrices using 14 information encoded in decision systems (e.g., sorted in preprocessing [16, 178, 373]) directly, which significantly improves the efficiency of algorithms. It is important to note that the methodology makes it possible to construct heuristics having a very important approximation property which can be formulated as follows: Expressions generated by heuristics, i.e., implicants close to prime implicants define approximate solutions for the problem. In this section, we discuss applications of methods based on rough sets and Boolean reasoning in machine learning, pattern recognition, and data mining. In supervised machine learning paradigm [97, 115, 143, 151], a learning algorithm is given a training data set, usually in the form of a decision system A = (U,A,d) 15, prepared by an expert. Every such decision system classifies elements from U into decision classes. The purpose of the algorithm is to return a set of decision rules together with matching procedure and conflict resolution strategy, called a classifier, which makes it possible to classify unseen objects, i.e., objects that are not described in the original decision table. In this section, we provide a number of rough set methods that can be used in construction of classifiers. For more information the reader is referred, e.g., to [3, 20, 41, 42, 47, 50, 55, 64, 65, 71, 73, 85, 86, 87, 88, 89, 90, 92, 93, 98, 100, 101, 102, 113, 114, 125, 126, 130, 131, 138, 139, 140, 152, 153, 163, 176, 8, 194, 195, 196, 197, 199, 200, 218, 242, 249, 252, 253, 257, 270, 274, 275, 281, 283, 291, 293, 330, 342, 343, 344, 345, 347, 357, 369, 371, 374]), and for papers on hierarchical learning and ontology approximation, e.g., to [17, 21, 24, 25, 177, 180, 179, 285, 294, 299, 300]. Most of the techniques discussed below are based on computing prime implicants for computing different kinds of reducts. Unfortunately, they are computationally hard. However, many heuristics have been developed which turned out to be very promising. The results of experiments on many data sets, reported in the literature, show a very good quality of classification of unseen objects using these heuristics. A variety of methods for computing reducts and their applications can be found in [16, 121, 135, 200, 252, 253, 281, 283, 293, 295, 315, 317, 374, 375]. The fact that the problem of finding a minimal reduct of a given information system is NPhard was proved in [295]. As we mentioned, there exists a number of good heuristics that compute sufficiently many reducts in an acceptable time. Moreover, a successful methodology, based on different reducts, has been developed for solution of many problems like attribute selection, decision rule generation, association rule generation, discretization of real-valued attributes, and symbolic value grouping. For further readings the reader is referred to [20, 280, 345] (attribute selection); [173, 164, 165, 171, 289] (discretization); [166, 167] (discretization of data stored in relational databases); and [172] (reduct approximation and association rules). Many of these methods are based on discernibility matrices defined in this section. It is possible to compute the necessary information about these matrices using 14 That is, without the necessity of generation and storing of the discernibility matrices 15 For simplicity, we consider decision systems with one decision.

33 106 Hung Son Nguyen and Andrzej Skowron information or decision systems (e.g., sorted in preprocessing [16, 178]) directly what significantly improves the efficiency of algorithms. The results presented in this section have been implemented, e.g., in the RSES 16 software system (see also [15, 16, 19, 27, 116]). Sections are based on a chapter of the book [48]. For links to other rough set software systems the reader is referred to the RSDS Reducts in Information and Decision Systems A crucial concept in the rough set approach to machine learning is that of a reduct. In fact, the term reduct corresponds to a wide class of concepts. What typifies all of them is that they are used to reduce information (decision) systems by removing redundant attributes. In this section, we consider three kinds of reducts which will be used in the remainder of this chapter. Given an information system A = (U,A), a reduct is a minimal set (wrt inclusion) of attributes B A such that IND B = IND A, where IND B, IND A are the indiscernibility relations defined by B and A, respectively [215]. The intersection of all reducts is called a core. Intuitively, a reduct is a minimal set of attributes from A that preserves the original classification defined by A. Reducts are extremely valuable in applications. Unfortunately, the problem of finding a minimal reduct is NP-hard. One can also show that, for any m, there is an information system with m attributes having an exponential (wrt m) number of reducts. Fortunately, there are reasonably good heuristics which allow one to compute sufficiently many reducts in an acceptable amount of time. To provide a general method for computing reducts, we will use the following constructs. Let A = (U,A) be an information system with n objects. The discernibility matrix of A is an n n matrix with elements c i j consisting of the set of attributes from A on which objects x i and x j differ, i.e., (3.37) c i j = {a A : a(x i ) a(x j )}, for i, j = 1,...,n. A discernibility function f A for A is a propositional formula of m Boolean variables, a 1,...,a m, corresponding to the attributes a 1,...,a m, defined by (3.38) f A (a 1,...,a m) = 1 j<i m c c i j,c i j where c i j = {a : a c i j }. In the sequel, we write a i instead of a i, for simplicity. c, 16 the Rough Set Explorer System: 17 the Rough Set Database System:

34 3 Rough Sets: From Rudiments to Challenges 107 Table 3.1: The information table considered in Example 3.3 Ob ject Speed Color Humidity car1 medium green high car2 medium yellow low car3 high blue high Table 3.2: The discernibility matrix for the information table provided in Table 3.1 M (A) car1 car2 car3 car1 c,h s,c car2 c,h s,c,h car3 s,c s,c,h The discernibility function f A describes constraints which must hold to preserve discernibility between all pairs of discernible objects from A. It requires keeping at least one attribute from each non-empty element of the discernibility matrix corresponding to any pair of discernible objects. It can be shown [295] that for any information system A = (U,A) the set of all prime implicants of f A determines the set of all reducts of A. Example 3.3. Consider the information system A whose associated information table is provided in Table 3.1. The discernibility matrix for A is presented in Table 3.2. (The letters s, c and h stand for Speed, Color and Humidity, respectively.) The discernibility function for the information system A is then given by f A (s,c,h) (c h) (s c) (s c h). The prime implicants of f A (s,c,h) can be computed in order to derive the reducts for A: f A (s,c,h) (c h) (s c) (s c h) (c h) (s c) c (h s). The prime implicants of f A (s,c,h) are c and h s. Accordingly, there are two reducts of A, namely {Color} and {Humidity,Speed}. The second type of reduct used in this chapter are the decision-relative reducts for decision systems. In terms of decision tables, A (x), called the generalized decision function, is the mapping on U such that for any object x it specifies all rows in the table whose attribute values are the same as for x, and then collects the decision values from each row. A decision-relative reduct of A = (U, A, d) is a minimal (wrt inclusion) non-empty set of attributes B A such that B = A. Intuitively, the definition states that B allows us to classify exactly the same objects, as belonging to equivalence

35 108 Hung Son Nguyen and Andrzej Skowron Table 3.3: The decision table considered in Example 3.4 Ob ject Speed Color Humidity Danger car1 medium green high no car2 medium yellow small no car3 high blue high yes classes U/ A, as A. In terms of decision tables, the columns associated with the attributes A B may be removed without affecting the classification power of the original table. To compute decision-relative reducts, we extend the definitions of discernibility matrix and discernibility function in the following straightforward manner. Let A = (U,A,d) be a consistent decision system (i.e., A (x) consists of exactly one decision for any x U) and let M (A) = [c i j ] be the discernibility matrix of the information system (U,A). We construct a new matrix, M (A) = [c i j ], where { c, if d(xi ) = d(x i j = j ), c i j, otherwise. M (A) is called the decision-relative discernibility matrix of A. The decisionrelative discernibility function fa r for A is constructed from the decision-relative discernibility matrix for A in the same way as a discernibility function is constructed from a discernibility matrix. Then it can be shown [295], that the set of all prime implicants of fa r determines the set of all decision-relative reducts of the consistent decision system A. Example 3.4. Consider the decision table associated with a decision system A as represented in Table 3.3. The discernibility matrix for A is the same as the one given in Table 3.2, and the decision-relative discernibility matrix for A is provided in Table 3.4. Using the decision-relative discernibility matrix, we can compute the decisionrelative discernibility function for A: fa r (s,c,h) (s c) (s c h) (s c). The set of all prime implicants of fa r (s,c,h) is {s,c}. Therefore, there are two decision-relative reducts of A, namely {Speed} and {Color}. To each decision-relative reduct B of a decision system A, we assign a new decision system, called the B-reduction of A. The details are as follows. Let A = (U,A,d) be a consistent decision system and suppose that B is a decision-relative reduct of A. A B-reduction of A is a decision system A = (V,B,d), where: 18 V = {[x] B : x U}; 18 Recall that [x] B, where x U, denotes the equivalence class of the relation IND B which contains x.

36 3 Rough Sets: From Rudiments to Challenges 109 Table 3.4: The decision-relative discernibility matrix corresponding to the decision system shown in Table 3.3 M (A) car1 car2 car3 car1 car2 car3 s,c s,c,h s, c s,c,h Table 3.5: {Speed}-reduction of the decision system A Ob jects Speed Danger car1, car2 medium no car3 high yes a([x] B ) = a(x), for each a B and each [x] B V ; d([x] B ) = d(x), for each [x] B V. Let A be the {Speed}-reduction of the decision system A. The decision table associated with A is provided in Table 3.5. The above defined method for decision relative reducts computation can be easily extended to inconsistent decision systems. Observe that another kind of reducts can be obtained by using the discernibility requirement relative to the positive regions, i.e., POS A (d) = POS B (d) instead of B = A. Certainly, for inconsistent decision systems the former requirement is less restrictive than the latter. The last type of reduct, considered in this section, is used in applications where approximations to reducts are prefered to standard reducts. For example, approximate reducts for decision-relative reducts are making it possible to generate approximate decision rules. In the case of approximate reducts we relax the requirement for the discernibility preserving. Instead of preserving the discernibility for all entries of the discernibility matrix where it is necessary we preserve it to a degree, i.e., in a number of entries characterized by a coefficient α. Such reducts are called α-reducts, where α is a real number from the interval [0,1]. More formal definition of approximate reducts is the following: Let A = (U,A,d) be a decision system and let M (A) be the discernibility matrix of A. Assume further that n is the number of non-empty sets in M (A). A set of attributes B A is called an α-reduct if and only if m n α, where m is the number of sets in M (A) having a non-empty intersection with B. The reader is referred to [169, 175, 215, 315, 317] for information on various types of approximate reducts. Additionally, [18, 172, 257, 318] provide approximation criteria based on discernibility and, therefore, related to Boolean reasoning principles.

37 110 Hung Son Nguyen and Andrzej Skowron Attribute Selection In the supervised machine learning approach, a learning algorithm is provided with training data. In the context of rough set machine learning techniques, training data sets are provided in the form of training decision systems, or their equivalent representations as decision tables. Since the condition attributes of a specific decision table are typically extracted from large sets of unstructured data, it is often the case that some of the attributes are irrelevant for the purpose of classification. Such attributes should be removed from the table if possible. The attribute selection problem is the problem of choosing a relevant subset of attributes, while removing the irrelevant ones (see, e.g., [97]). A natural solution of the attribute selection problem is to assume that the intersection of the decision-relative reducts of a training decision table is the source of the relevant attributes. Unfortunately, there are two problems with this solution. Firstly, the intersection can be empty. Secondly, the number of attributes contained in all decision-relative reducts is usually small. Consequently, although these attributes perfectly characterize the training decision table, they are, in general, inadequate for providing a satisfactory classification of new objects not occurring in the training data. To deal with the attribute selection problem, it is often reasonable to use various approximations of decision-relative reducts. Let A = (U,A,d) be a consistent decision system. One can treat any subset B of A as an approximate reduct of A. The number (3.39) ε A,{d} (B) = γ(a,{d}) γ(b,{d}) γ(a,{d}) = 1 γ(b,{d}) γ(a,{d}), is called an error approximation of A by B. 19 The error approximation of A by B expresses exactly how the set of attributes B approximates the set of condition attributes A with respect to determination of d. Note that ε A,{d} (B) [0,1], where 0 indicates no error, and the closer ε A,{d} (B) is to 1, the greater is the error. The reader is referred, e.g., to [172, 318] for more information on approximate reducts. There are two general approaches to attribute selection: an open-loop approach and a closed-loop approach. Methods based on the open-loop approach are characterized by the fact that they do not use any feedback information about classifier quality for attribute selection. In contrast, the methods based on the closed-loop approach use feedback information as criteria for attribute selection. A number of attribute selection algorithms have been proposed in the machine learning literature, but they will not be considered here since our focus is on rough set based techniques. Rough set techniques which attempt to solve the attribute se- 19 Recall that the coefficient γ(x,y ) expresses the degree of dependency between sets of attributes X and Y.

38 3 Rough Sets: From Rudiments to Challenges 111 lection problem are typically based on the closed-loop approach and consist of the following basic steps: Decision-relative reducts are extracted from a training decision table. The attributes contained in these reducts (in their intersection or in most of them) are viewed as potentially relevant. 2. Using the specific machine learning algorithm, a classifier based on the chosen attributes is constructed. 3. The classifier is then tested on a new set of training data; if its performance is unsatisfactory (wrt some measure), a new set of attributes is constructed by extracting additional (approximate) reducts for the initial training table, and the process is repeated. Reducts need not be the only source of information used in the selection of attributes. The rough set approach offers another interesting possibility. The main idea is to generalize the notion of attribute reduction by introducing the concept of significance of attributes. This measure enables attributes to be evaluated using a multi-valued scale which assigns a real number from the interval [0,1] to an attribute. This number, expressing the importance of an attribute in a decision system, is evaluated by measuring the effect of removing the attribute from the table. The significance of an attribute a in a decision table A = (U,A,d) is defined by (3.40) σ A,{d} (a) = γ(a,{d}) γ(a {a},{d}) γ(a,{d}) = 1 γ(a {a},{d}). γ(a,{d}) Assume that B A. The significance coefficient can be extended to sets of attributes as follows, (3.41) σ (A,{d}) (B) = γ(a,{d}) γ(a B,{d}) γ(a,{d}) = 1 γ(a B,{d}). γ(a,{d}) The coefficient σ A,{d} (B), can be understood as a classification error which occurs when the attributes from B are removed from the decision system. Note that σ A,{d} (B) [0,1], where 0 indicates that removal of attributes in B causes no error, and the closer σ A,{d} (B) is to 1, the greater the error is. Remark 3.1. In this section, we have mainly concentrated on the case, where the attributes are selected from the set of attributes of the input decision system. In some cases it might be useful to replace some attributes by a new one. For example, if one considers a concept of a safe distance between vehicles, then attributes, say V S standing for vehicle speed and SL standing for speed limit, can be replaced by an attribute DIF representing the difference SL VS. In fact, the new attribute better corresponds to the concept of safe distance than the pair (V S,SL). 20 There are public domain software packages, for instance the RSES system (for references see, e.g., [27] and rses/), which offer software that may be used to solve the attribute selection problem.

39 112 Hung Son Nguyen and Andrzej Skowron Value Set Reduction Consider a decision system with a large number of attribute values. There is a very low probability that a new object will be properly recognized by matching its attribute value vector with any of the rows in the decision table associated with the decision system. So, in order to construct a high quality classifier, it is often necessary to reduce the cardinality of the value sets of specific attributes in a training decision table. The task of reducing the cardinality of value sets is referred to as the value set reduction problem(see, e.g., [169]). In this section, two methods of value set reduction are considered [169]: 1. discretization, used for real value attributes, and 2. symbolic attribute value grouping, used for symbolic attributes Discretization A discretization replaces value sets of condition real-valued attributes with intervals. The replacement ensures that a consistent decision system is obtained (assuming a given consistent decision system) by substituting original values of objects in the decision table by the unique names of the intervals comprising these values. This substantially reduces the size of the value sets of real-valued attributes. The use of discretization is not specific to the rough set approach but to machine learning. In fact, a majority of rule or tree induction algorithms require it for a good performance. Let A = (U,A,d) be a consistent decision system. Assume V a = [l a,r a ) R, 21 for any a A, and l a < r a. A pair (a,c), where a A and c V a, is called a cut on V a. Any attribute a A defines a sequence of real numbers v a 1 < va 2 < < va k a, where {v a 1,va 2,...,va k a } = {a(x) : x U}. The set of basic cuts on a, written B a, is specified by B a = {(a,(v a 1 + va 2 )/2),(a,(va 2 + va 3 )/2),...,(a,(va k a 1 + va k a )/2)}. The set a A B a is called the set of basic cuts on A. Example 3.5. Consider a consistent decision system A and the associated decision table presented in Table 3.6(a). We assume that the initial value domains for the attributes a and b are V a = [0,2);V b = [0,4). The sets of values of a and b for objects from U are 21 R denotes the set of real numbers.

40 3 Rough Sets: From Rudiments to Challenges 113 Table 3.6: The discretization process: (a) The original decision system A considered in Example 3.5 (b) The C-discretization of A considered in Example 3.6 A a b d u u u u u u u (a) A C a C b C d u u u u u u u (b) a(u) = {0.8, 1.0, 1.3, 1.4, 1.6}; b(u) = {0.5, 1.0, 2.0, 3.0}. By definition, the sets of basic cuts for a and b are B a = {(a,0.9), (a,1.15), (a,1.35), (a,1.5)}; B b = {(b,0.75); (b,1.5); (b,2.5)}. Using the idea of cuts, decision systems with real-valued attributes can be discretized. For a decision system A = (U,A,d) and a A, let C a = {(a,c a 1 ),(a,ca 2 ),...,(a,ca k )}, be any set of cuts of a. Assume that c a 1 < ca 2 < < ca k. The set of cuts C = a AC a defines a new decision system A C = (U,A C,d), called the C-discretization of A, where A C = {a C : a A}; 0, if and only if a(x) < c a a C 1, (x) = i, if and only if a(x) [c a i,ca i+1 ), for i {1,...,k 1}, k + 1, if and only if a(x) > c a k. Example 3.6 (Example 3.5 continued). Let C = B a B b. It is easy to check that the C-discretization of A is the decision system whose decision table is provided in Table 3.6 (b). Since a decision system can be discretized in many ways, a natural question arises how to evaluate various possible discretizations. A set of cuts C is called A-consistent, if A = A C, where A and A C are generalized decision functions for A and A C, respectively. An A-consistent set of cuts C is A-irreducible if C is not A-consistent for any C C. The A-consistent set of cuts C is A-optimal if card(c) card(c ), for any A-consistent set of cuts C.

41 114 Hung Son Nguyen and Andrzej Skowron As easily observed, the set of cuts considered in Example 3.6 is A-consistent. However, as we shall see in Example 3.7, it is neither optimal nor irreducible. Since the purpose of the discretization process is to reduce the size of individual value sets of attributes, we are primarily interested in optimal sets of cuts. These are extracted from the basic sets of cuts for a given decision system. Let A = (U,A,d) be a consistent decision system where U = {u 1,...,u n }. Recall that any attribute a A defines a sequence v a 1 < va 2 < < va k a, where {v a 1,va 2,...,va k a } = {a(x) : x U}. Let ID(A) be the set of pairs (i, j) such that i < j and d(u i ) d(u j ). We now construct a propositional formula, called the discernibility formula of A, as follows: 1. To each interval of the form [ v a k k+1),va, a A and k {1,...,na 1}, we assign a Boolean variable denoted by p a k. The set of all these variables is denoted by V (A). 2. We first construct a family of formulas {B(a,i, j) : a A and (i, j) ID(A)}, where B(a,i, j) is a disjunction of all elements from the set {p a k : [ v a k,va k+1) [min{a(ui ),a(u j )},max{a(u i ),a(u j )})}. 3. Next, we construct a family of formulas {C(i, j) : i, j {1,...,n},i < j and (i, j) ID(A)}, where C(i, j) = a A B(a,i, j). 4. Finally, the discernibility formula for A, D(A), is defined as D(A) = C(i, j), where i < j and (i, j) ID(A) and C(i, j) FALSE. Any non empty set S = {p a 1 k 1,..., p a r k r } of Boolean variables from V (A) uniquely defines a set of cuts, C(S), given by C(S) = {(a 1,(v a 1 k 1 + v a 1 k 1 +1 )/2),,(a r,(v a r k r + v a r k r +1 )/2)}. Then we have the following properties: Let A = (U,A,d) be a consistent decision system. For any non-empty set S V (A) of Boolean variables, the following two conditions are equivalent: 1. The conjunction of variables from S is a prime implicant of the discernibility formula for A. 2. C(S) is an A-irreducible set of cuts on A. Let A = (U,A,d) be a consistent decision system. For any non-empty set S V (A) of Boolean variables, the following two conditions are equivalent:

42 3 Rough Sets: From Rudiments to Challenges The conjunction of variables from S is a minimal (wrt to length) prime implicant of the discernibility formula for A. 2. C(S) is an A-optimal set of cuts on A. Example 3.7 (Example 3.6 continued). ID(A) = {(1,2), (1,3), (1,5), (2,4), (2,6), (2,7) (3,4), (3,6), (3,7), (4,5), (5,6), (5,7)}. 1. We introduce four Boolean variables, p a 1, pa 2, pa 3, pa 4, corresponding respectively to the intervals [0.8, 1.0), [1.0, 1.3), [1.3, 1.4), [1.4, 1.6) of the attribute a, and three Boolean variables, p b 1, pb 2, pb 3, corresponding respectively to the intervals [0.5, 1.0), [1.0, 2.0), [2, 3.0) of the attribute b 2. The following are the formulas B(a,i, j) and B(b,i, j), where i < j and (i, j) ID(A): B(a,1,2) p a 1 B(b,1,2) p b 1 pb 2 B(a,1,3) p a 1 pa 2 B(a,1,5) p a 1 pa 2 pa 3 B(b,1,3) p b 3 B(b,1,5) FALSE B(a,2,4) p a 2 pa 3 B(b,2,4) p b 1 B(a,2,6) p a 2 pa 3 pa 4 B(b,2,6) p b 1 pb 2 pb 3 B(a,2,7) p a 2 B(b,2,7) p b 1 B(a,3,4) p a 3 B(a,3,6) p a 3 pa 4 B(b,3,4) p b 2 pb 3 B(b,3,6) FALSE B(a,3,7) FALSE B(a,4,5) FALSE B(b,3,7) p b 2 pb 3 B(b,4,5) p b 2 B(a,5,6) p a 4 B(b,5,6) p b 3 B(a,5,7) p a 3 B(b,5,7) p b The following are the formulas C(i, j), where i < j and (i, j) ID(A): C(1,2) p a 1 pb 1 pb 2 C(1,3) p a 1 pa 2 pb 3 C(1,5) p a 1 pa 2 pa 3 C(2,4) p a 2 pa 3 pb 1 C(2,6) p a 2 pa 3 pa 4 pb 1 pb 2 pb 3 C(2,7) p a 2 pb 1 C(3,4) p a 3 pb 2 pb 3 C(3,6) p a 3 pa 4 C(3,7) p b 2 pb 3 C(4,5) p b 2 C(5,6) p a 4 pb 3 C(5,7) p a 3 pb The discernibility formula for A is then given by

43 116 Hung Son Nguyen and Andrzej Skowron Table 3.7: The C-discretization considered in Example 3.7 A C a C b C d u u u u u u u D(A) (p a 1 pb 1 pb 2 ) (pa 1 pa 2 pb 3 ) (p a 1 pa 2 pa 3 ) (pa 2 pa 3 pb 1 ) (p a 2 pa 3 pa 4 pb 1 pb 2 pb 3 ) (pa 2 pb 1 ) (p a 3 pb 2 pb 3 ) (pa 3 pa 4 ) (pb 2 pb 3 ) p b 2 (pa 4 pb 3 ) (pa 3 pb 2 ). The prime implicants of the formula D(A) are p a 2 pa 4 pb 2 p a 2 pa 3 pb 2 pb 3 p a 3 pb 1 pb 2 pb 3 p a 1 pa 4 pb 1 pb 2. Suppose we take the prime implicant p a 1 pa 4 pb 1 pb 2. Its corresponding set of cuts is C = {(a,0.9),(a,1.5),(b,0.75),(b,1.5)}. The decision table for the C-discretization of A is provided in Table 3.7. Observe that the set of cuts corresponding to the prime implicant p a 2 pa 4 pb 2 is {(a,1.15),(a,1.5),(b,1.5)}. Thus C is not an optimal set of cuts. The problem of searching for an optimal set of cuts P in a given decision system A is NP-hard. However, it is possible to devise efficient heuristics which, in general, return reasonable sets of cuts. One of them, called MD-heuristics, is presented below. We say that a cut (a,c) discerns objects x and y if and only if a(x) < c a(y) or a(y) < c a(x). Let n be the number of objects and let k be the number of attributes of a decision system A. It can be shown that the best cut can be found in O(kn) steps using O(kn) space only. Example 3.8. Consider the decision table with the associated decision system A, provided in Table 3.6 from Example 3.5. The associated information table for the information system A is presented in Table 3.8.

44 3 Rough Sets: From Rudiments to Challenges 117 INPUT: a decision system A = (U,A,d) OUTPUT: a set of cuts C 1. Set C to. 2. Let a AC a be the set of basic cuts on A. 3. Construct an information table A = (U,A ) such that U is the set of pairs (u i,u j ) of objects discerned by d (in A) such that i < j; A = a AC a, where for each c A, { 1, if and only if c discerns x and y (in A), c(x,y) = 0, otherwise. 4. Choose a column from A with the maximal number of occurrences of 1 s; add the cut corresponding to this column to C; delete the column from A, together with all rows marked with 1 in it. 5. If A is non-empty, then go to step 4 else stop. Table 3.8: The information table for the information system A A (a,0.9) (a,1.15) (a,1.35) (a,1.5) (b,0.75) (b,1.5) (b,2.5) (u 1,u 2 ) (u 1,u 3 ) (u 1,u 5 ) (u 2,u 4 ) (u 2,u 6 ) (u 2,u 7 ) (u 3,u 4 ) (u 3,u 6 ) (u 3,u 7 ) (u 4,u 5 ) (u 5,u 6 ) (u 5,u 7 ) Under the assumption that columns with maximal number of 1 s are chosen from left to right (if many such columns exist in a given step), the set of cuts returned by the algorithm is {(a,1.35),(b,1.5),(a,1.15),(a,1.5)}. However, as shown in Example 3.7, it is not an optimal set of cuts Symbolic Attribute Value Grouping Symbolic attribute value grouping is a technique for reducing the cardinality of value sets of symbolic attributes. Let A = (U,A,d) be a decision system. Any function c a : V a {1,...,m}, where m card(v a ), is called a clustering function for V a. The rank of c a, denoted by rank (c a ), is the value card({c a (x) x V a }). For B A, a family of clustering functions {c a } a B is B-consistent if and only if

45 118 Hung Son Nguyen and Andrzej Skowron implies a B [c a (a(u)) = c a (a(u ))], (u,u ) IND B IND d, for any pair (u,u ) U. The notion of B-consistency has the following intuitive interpretation: If two objects are indiscernible wrt clustering functions for value sets of attributes from B, then they are indiscernible either by the attributes from B or by the decision attribute. We consider the following problem, called the symbolic value partition grouping problem: Given a decision system A = (U,A,d), where U = {u 1,...,u k }, and a set of attributes B A, search for a B-consistent family {c a } a B of clustering functions such that rank (c a ) is minimal. a B In order to solve this problem, we apply the following steps: 1. Introduce a set of new Boolean variables: 22 {a v v : a B and v,v V a and v v }. We extract a subset S of this set such that a v v S implies that v < v wrt some arbitrary linear order < on the considered domain. 2. Construct matrix M = [c i j ] i, j=1,...,k as follows: c i j = {a v v S : v = a(u i ) and v = a(u j ) and d(u i ) d(u j )}. It is easily seen that in the case of a binary decision, the matrix can be reduced by placing objects corresponding to the first decision in rows and those corresponding to the second decision in columns. We call such a matrix a reduced discernibility matrix. 3. Using the reduced matrix, M, obtained in the previous step, construct the function c. c i j M c c i j,c i j 4. Compute the shortest (or sufficiently short) prime implicant I of the constructed function. 5. Using I, construct, for each attribute a B, an undirected graph Γ a = V Γ a,e Γ a, where V Γ a = {a v v V a }; E Γ a = {(a x,a y ) x,y U and a(x) a(y)}. Note that using I one can construct E Γ a due to the equality E Γ a = {(a v,a v ) : a v v occurs in I}. 22 The introduced variables serve to discern between pairs of objects wrt an attribute a.

46 3 Rough Sets: From Rudiments to Challenges 119 Table 3.9: The decision table considered in Example 3.9 A a b d u 1 a 1 b 1 0 u 2 a 1 b 2 0 u 3 a 2 b 3 0 u 4 a 3 b 1 0 u 5 a 1 b 4 1 u 6 a 2 b 2 1 u 7 a 2 b 1 1 u 8 a 4 b 2 1 u 9 a 3 b 4 1 u 10 a 2 b Find a minimal coloring of vertices for Γ a. 23 The coloring defines a partition of Va Γ by assuming that all vertices of the same color belong to the same partition set and no partition set contains vertices with different colors. Partition sets are named using successive natural numbers. The clustering function for Va Γ is c a (a v ) = i, provided that a v is a member of the i-th partition set. Remark 3.2. In practical implementations, one does not usually construct the matrix M explicitly, as required in Steps (2)-(3) above. Instead, prime implicants are directly extracted from the original decision system. It should be emphasized that in Step (4) above, there can be many different shortest prime implicants and in Step (6) there can be many different colorings of the obtained graphs. Accordingly, one can obtain many substantially different families of clustering functions resulting in different classifiers. In practice, one often generates a number of families of clustering functions, tests them against data and chooses the best one. Using the construction above to generate a family of partitions, it is usually possible to obtain a substantially smaller decision table, according to the following definition. Let A = (U,A,d) be a decision system and B A. Any family of clustering functions c = {c a } a B specifies a new decision system A c = (U,A c,d) called the c-reduction of A wrt B, where A c = {a c : a B} and a c (x) = c a (a(x)). Example 3.9. Consider the decision table provided in Table 3.9. The goal is to solve the symbolic value partition problem for B = A. One then has to perform the following steps: 1. Introduce new Boolean variables a u v,b w x, for all u,v V a,u < v and w,x V b,w < x. 23 The colorability problem is solvable in polynomial time for k = 2, but remains NP-complete for all k 3. But, similarly to discretization, one can apply some efficient search heuristics for generating (sub-) optimal partitions.

47 u 6 a 2 b 2 1 u 7 a 2 b 1 1 u 8 a 4 b 2 1 u 9 a 3 b 4 1 u 10 a 2 b Table 10 Hung Son Nguyen and Andrzej Skowron The reduced matrix corresponding to the decision table provided in Table 9 Table 3.10: The reduced matrix corresponding to the decision table provided in Table 3.9 M u 1 u 2 u 3 u 4 u 5 u 6 u 7 u 8 u 9 u 10 M b b1 b 4 u 1 b b2 bu 42 ua 3 a1 a 2, b b3 b 4 u 4 u 5 b b 1 b 4 b b 2 b 4 a a 1 a 2, b 3 4 a a 1 a 3, b b 1 a a1 a 2, b b1 b b 4 u 6 a a 2 a a1 1 a 2, b b a 1 b 2 a a 2 b b2 1 a 2 b b b 3 2 b 3 a a 2 a 3, b b 1 a a1 b 2 a 2 u 7 a a a a1 1 a a 2 a 2, b b1 1 b a 2, b 2 1 b 2 b b b b1 1 b 3 b 3 a a 2 a 3 u a a1 a 8 a a 1 a 4, b b 1 b 2 a 1 a 4 a a 2 a 4, b b 2 b 3 a a 3 a 4, b b 4, b b1 b 2 a a1 a 4 a a2 a 4, b b2 b 3 1 b 2 ua a1 9 a a 1 a 3, b b 1 b 4 a 1 a 3, b 2 b 4 a a 2 a 3, b b 3 b 4 b b 1 a 3, b b1 b 4 a a1 a 3, b b2 b 4 a a2 a 3, b b3 b 4 b 4 u 10 a a 1 a 2, b b 1 b 5 a 1 a 2, b 2 b 5 b b 3 b 5 a a 2 a 3, b b 1 a a1 a 2, b b1 a a1 a 2, b b2 b b3 b 5 b 5 b 5 b 5 a a1 a 3, b b1 b 4 a a2 a 3, b b1 b 2 a a2 a 3 a a3 a 4, b b1 b 2 b b1 b 4 a a2 a 3, b b1 b 5 a 1 a 2 b 1 b 2 b3 a 3 a 4 b 5 b 4 Fig. 1. Coloring of attribute value graphs constructed in Example 9 Fig. 3.3: Coloring of attribute value graphs constructed in Example 3.9 One then has to perform the following steps: Table 3.11: The reduced table corresponding to graphs shown in Figure 3.3 (1) Introduce new Boolean variables a u v, b w x, for all u, v V a, u < v and w, x V b, w < x. a c b c d (2) The reduced matrix M is presented 1 1 in0 Table The reduced matrix M is presented in Table The required Boolean function is given by b b 1 b 4 b b 2 b 4 (a a 1 a 2 b b 3 b 4 ) (a a 1 a 3 b b 1 b 4 ) (a a 1 a 2 b b 1 b 2 ) a a 1 a 2 b b 2 b 3 (a a 2 a 3 b b 1 b 2 ) a a 1 a 2 (a a 1 a 2 b b 1 b 2 ) b b 1 b 3 a a 2 a 3 (a a 1 a 4 b b 1 b 2 ) a a 1 a 4 (a a 2 a 4 b b 2 b 3 ) (a a 3 a 4 b b 1 b 2 ) (a a 1 a 3 b b 1 b 4 ) (a a 1 a 3 b b 2 b 4 ) (a a 2 a 3 b b 3 b 4 ) b b 1 b 4 (a a 1 a 2 b b 1 b 5 ) (a a 1 a 2 b b 2 b 5 ) b b 3 b 5 (a a 2 a 3 b b 1 b 5 ). 4. The shortest prime implicant for the function is I a a 1 a 2 a a 2 a 3 a a 1 a 4 a a 3 a 4 b b 1 b 4 b b 2 b 4 b b 2 b 3 b b 1 b 3 b b 3 b 5.

48 3 Rough Sets: From Rudiments to Challenges The graphs corresponding to a and b are shown in Figure The graphs are 2-colored, as shown in Figure 3.3, where nodes marked by are colored black and the other nodes are colored white. These colorings generate the following clustering functions: c a (a 1 ) = c a (a 3 ) = 1 c a (a 2 ) = c a (a 4 ) = 2 c b (b 1 ) = c b (b 2 ) = c b (b 5 ) = 1 c b (b 3 ) = c b (b 4 ) = 2. Given these clustering functions, one can construct a new decision system (see Table 3.11). Observe that discretization and symbolic attribute value grouping can be simultaneously used in decision systems including both real-value and symbolic attributes Minimal Decision Rules In this section, techniques for constructing minimal rules for decision systems will be considered. Given a decision table A, a minimal decision rule (wrt A) is a rule which is TRUE in A and which becomes FALSE in A if any elementary descriptor from the left-hand side of the rule is removed. 24 The minimal number of elementary descriptors in the left-hand side of a minimal decision rule defines the largest subset of a decision class. Accordingly, information included in the conditional part of any minimal decision rule is sufficient for predicting the decision value of all objects satisfying this part of the rule. The conditional parts of minimal decision rules define the largest object sets relevant for approximating decision classes. The conditional parts of minimal decision rules can be computed using prime implicants. To compute the set of all minimal rules wrt to a decision system A = (U,A,d), we proceed as follows, for any object x U: 1. Construct a decision-relative discernibility function fx r by considering the row corresponding to object x in the decision-relative discernibility matrix for A. 2. Compute all prime implicants of fx r. 3. On the basis of the prime implicants, create minimal rules corresponding to x. To do this, consider the set A(I) of attributes corresponding to propositional variables in I, for each prime implicant I, and construct the rule: (a = a(x)) d = d(x). a A(I) 24 A decision rule ϕ ψ is TRUE in A if and only if ϕ A ψ A.

49 122 Hung Son Nguyen and Andrzej Skowron Table 3.12: Decision table considered in Example 3.10 Ob ject L W C S large green no large blue no medium green yes medium red yes medium blue no medium green no large red no Table 3.13: {L,W}-reduction considered in Example 3.10 Ob jects L W S 1, large no 3,4 4.0 medium yes medium no medium no large no The following example illustrates the idea. Example Consider the decision system A whose decision table is provided in Table Table 3.12 contains the values of condition attributes of vehicles (L,W,C, standing for Length, Width, and Color, respectively), and a decision attribute S standing for Small which allows one to decide whether a given vehicle is small. This system has exactly one decision-relative reduct consisting of attributes L and W. The {L,W}-reduction of A as shown in Table Table 3.13 results in the following set of non-minimal decision rules: (L = 7.0) (W = large) (S = no) (L = 4.0) (W = medium) (S = yes) (L = 5.0) (W = medium) (S = no) (L = 4.5) (W = medium) (S = no) (L = 4.0) (W = large) (S = no). To obtain the minimal decision rules, we apply the construction provided above, for x {1,...,7}. 1. The decision-relative discernibility functions f1 r,..., f 7 r are constructed on the basis of the reduced discernibility matrix shown in Table 3.14:

50 3 Rough Sets: From Rudiments to Challenges 123 Table 3.14: Reduced decision-relative discernibility matrix from Example L,W L,W,C 2 L,W,C L,W,C 5 L,C L,C 6 L L,C 7 W,C W f r 1 (L W) (L W C) (L W) f r 2 (L W C) (L W C) (L W C) f r 3 (L W) (L W C) (L C) L (W C) (L W) (L C) f r 4 (L W C) (L W C) (L C) (L C) W f r 5 f r 6 (L W) (C W) (L C) (L C) (L C) L (L C) L f r 7 (W C) W W. 2. The following prime implicants are obtained from formulas f r 1,..., f r 7 : f1 r: L, W f2 r : L, W, C f3 r : L W, L C f4 r : L W, C W f5 r: L, C f6 r: L f7 r: W. 3. Based on the prime implicants, minimal decision rules are created for objects 1,...,7. For instance, from prime implicants L and W corresponding to f r 1, the following minimal decision rules are generated based on object 1: (L = 7.0) (S = no) (W = large) (S = no). On the basis of object 3 and prime implicants L W and L C for f3 r we obtain the following rules: (L = 4.0) (W = medium) (S = yes) (L = 4.0) (C = green) (S = yes). Similarly, minimal decision rules can easily be obtained for all other formulas.

51 124 Hung Son Nguyen and Andrzej Skowron In practice, the number of minimal decision rules can be large. One then tries to consider only subsets of these rules or to drop some conditions from minimal rules. Remark 3.3. The main challenge in inducing rules from decision systems lies in determining which attributes should be included into the conditional parts of the rules. Using the strategy outlined above, the minimal rules are computed first. Their conditional parts describe the largest object sets with the same generalized decision value in a given decision system. Although such minimal decision rules can be computed, this approach can result in a set of rules of unsatisfactory classification quality. Such rules might appear too general or too specific for classifying new objects. This depends on the data analyzed. Techniques have been developed for the further tuning of minimal rules Example: Learning of Concepts Given that one has all the techniques described in the previous sections at one s disposal, an important task is to induce definitions of concepts from training data, where the representation of the definition is as efficient and of high quality as possible. These definitions may then be used as classifiers for the induced concepts. Let us concentrate on the concept of Distance between cars on the road. The rough relation Distance(x, y, z) denotes the approximate distance between vehicles x and y, where z {small, medium, large, unknown}. Below we simplify the definition somewhat, and consider Distance(x, z) which denotes that the distance between x and the vehicle directly preceding x is z. 25 Assume that sample training data has been gathered in a decision table which is provided in Table 3.15, where 26 SL stands for the speed limit on a considered road segment; V S stands for the vehicle speed ; W stands for weather conditions ; AD stands for actual distance between a given vehicle and its predecessor on the road. For the sake of simplicity, we concentrate on generating rules to determine whether the distance between two objects is small. On the basis of the training data, one can compute a discernibility matrix. Since we are interested in rules for the decision small only, it suffices to consider a simplified discernibility matrix with columns labelled by objects 1 and 3, as these are the only two objects, where the corresponding decision is small. The resulting discernibility matrix is shown in Table The discernibility matrix gives rise to the following discernibility functions: 25 In fact, here we consider a distance to be small if it causes a dangerous situation, and to be large if the situation is safe. 26 Of course, real-life sample data would consist of hundreds or thousands of examples.

52 3 Rough Sets: From Rudiments to Challenges 125 Table 3.15: Training data considered in Section Ob ject SL V S W AD Distance rain 3.0 small sun 5.0 medium rain 5.0 small sun 9.0 medium rain 9.0 large sun 5.0 large rain 15.0 large rain 15.0 large Table 3.16: Discernibility matrix of Table 3.15 for decision small Ob ject V S,W,AD SL,V S,W 4 SL,W, AD W, AD 5 SL,V S,AD SL,V S,AD 6 SL,V S,W, AD SL,V S,W 7 AD SL, AD 8 SL,V S,AD V S,AD f 1 (V S W AD) (SL W AD) (SL V S AD) (SL V S W AD) AD (SL V S AD) AD f 3 (SL V S W) (W AD) (SL V S AD) (SL V S W) (SL AD) (V S AD) (W AD) (SL AD) (V S AD) (SL V S W). Based on the discernibility functions, one can easily find prime implicants and obtain the following rules for the decision small: 27 (3.42) (AD = 3.0) (Distance = small) (W = rain) (AD = 5.0) (Distance = small) (SL = 50) (AD = 5.0) (Distance = small) (V S = 60) (AD = 5.0) (Distance = small) (SL = 50) (V S = 60) (W = rain) (Distance = small). There have been also developed methods for approximation of compound concepts based on rough sets, hierarchical learning, and ontology approximation (see, e.g., [13, 17, 21, 24, 25, 177, 180, 179, 285, 294, 299, 300]). 27 In practical applications one would have to discretize AD before extracting rules.

53 126 Hung Son Nguyen and Andrzej Skowron Table 3.17: Information table considered in Example 3.11 Customer Bread Milk Jam Beer 1 yes yes no no 2 yes yes yes yes 3 yes yes yes no 4 no yes yes no Association Rules In this section [172, 175], we show how rough set techniques can be used to extract association rules from information systems. Association rules playing an important role in the field of data mining, provide associations among attributes 28. A real number from the interval [0,1] is assigned to each rule and provides a measure of the confidence of the rule. The following example will help to illustrate this. Example Consider the information table provided in Table Each row in the table represents items bought by a customer. For instance, customer 1 bought bread and milk, whereas customer 4 bought milk and jam. An association rule that can be extracted from the above table is: a customer who bought bread also bought milk. This is represented by (Bread = yes) (Milk = yes). Since all customers who bought bread actually bought milk too, the confidence of this rule is 1. Now consider the rule (Bread = yes) (Milk = yes) (Jam = yes) stating that a customer who bought bread and milk, bought jam as well. Since three customers bought both bread and milk and two of them bought jam, the confidence of this rule is 2/3. We now formalize this approach to confidence measures for association rules. Recall that by a template we mean a conjunction of elementary descriptors, i.e., expressions of the form a = v, where a is an attribute and v V a. For an information system A and a template T we denote by support A (T ) the number of objects satisfying T. Let A be an information system and T = D 1... D m be a template. By an association rule generated from T, we mean any expression of the form D i D j, D i P D j Q where {P,Q} is a partition of {D 1,...,D m }. By a confidence of an association rule φ D i P D i D j Q D j we mean the coefficient 28 Association between attributes are also studied using association reducts [315].

54 3 Rough Sets: From Rudiments to Challenges 127 con f idence A (φ) = support A(D 1... D m ) support A (. D i ) D i P There are two basic steps used in methods aimed at generating association rules. (Below s and c stand for support and confidence thresholds wrt a given information system A, respectively.) 1. Generate as many templates T = D 1... D k as possible, such that support A (T ) s and support A (T D i ) < s, for any descriptor D i different from all descriptors D 1,...,D k. 2. Search for a partition {P,Q} of T, for each T generated in the previous step, satisfying a. support A (P) < support A(T ) c b. P has the shortest length among templates satisfying (a). Every such partition leads to an association rule of the form P Q whose confidence is greater than c. The second step, crucial to the process of extracting association rules, can be solved using rough set methods. Let T = D 1 D 2... D m be a template such that support A (T ) s. For a given confidence threshold c [0,1], the association rule φ P Q is called c-irreducible if con f idence A (P Q) c and for any association rule φ P Q such that P is a sub-formula of P, we have con f idence A (P Q ) < c. The problem of searching for c-irreducible association rules from a given template is equivalent to the problem of searching for α-reducts in a decision table, for some α [0,1] (see Section 3.8.1). Let A be an information system and T = D 1 D 2... D m be a template. By a characteristic table for T wrt A, we understand a decision system A T = (U,A T,d), where 1. A T = {a D1,a D2,...,a Dm } is a set of attributes corresponding to the descriptors of T such that { 1, if the object u satisfies Di, a Di (u) = 0, otherwise; 2. the decision attribute d determines if the object satisfies a template T, i.e., { 1, if the object u satisfies T, d(u) = 0, otherwise. The following property provides the relationship between association rules and approximations of reducts.

55 128 Hung Son Nguyen and Andrzej Skowron For a given information system A = (U,A), a template T = D 1 D 2... D m and a set of descriptors P {D 1,...,D m }, the association rule is D i D i P D j {D 1,...,D m } P D j, D i P 1. a 1-irreducible association rule from T if and only if {a Di } is a decisionrelative reduct of A T ; 2. a c-irreducible association rule from T if and only if of A T, where D i P [( ) ( )] 1 α = 1 c / 1 U support A (T ) 1. {a Di } is an α-reduct The problem of searching for the shortest association rules is NP-hard. The following example illustrates the main ideas used in the searching method for association rules. Example Consider the information table A with 18 objects and 9 attributes presented in Table Table 3.18: Information table A considered in Example 3.12 A a 1 a 2 a 3 a 4 a 5 a 6 a 7 a 8 a 9 u u aa 1 aa u aa 1 aa u aa 1 aa u aa 1 aa u aa 1 aa u aa 1 aa u aa 1 aa u aa 1 aa u aa 1 aa u aa 2 aa u aa 2 aa u aa 1 aa u aa 2 aa u aa 1 aa u aa 1 aa u aa 1 aa u aa 2 aa

56 3 Rough Sets: From Rudiments to Challenges 129 Consider the template (3.43) T = (a 1 = 0) (a 3 = 2) (a 4 = 1) (a 6 = 0) (a 8 = 1). It is easily seen that support A (T ) = 10. The new constructed decision table A T is presented in Table Table 3.19: Decision table A T considered in Example 3.12 A T a D1 a D2 a D3 a D4 a D5 d (a 1 = 0) (a 3 = 2) (a 4 = 1) (a 6 = 0) (a 8 = 1) u u u u u u u u u u u u u u u u u u The reduced discernibility matrix A T is provided in Table 3.20, where for simplicity, the second column represents, in fact, ten columns with identical contents, labeled by u 2,u 3,u 4,u 8,u 9,u 10,u 13,u 15,u 16,u 17, respectively. Table 3.20: Reduced discernibility matrix for A T from Example 3.12 M (A T ) u 2,u 3,u 4,u 8,u 9 u 10,u 13,u 15,u 16,u 17 u 1 u 5 u 6 u 7 u 11 u 12 u 14 u 18 a D2,a D4,a D5 a D1,a D3,a D4 a D2,a D3,a D4 a D1,a D2,a D3,a D4 a D1,a D3,a D5 a D2,a D3,a D5 a D3,a D4,a D5 a D1,a D5

57 130 Hung Son Nguyen and Andrzej Skowron Given the discernibility matrix, one can easily compute the discernibility function A T for A T : f AT (a D1,a D2,a D3,a D4,a D5 ) (a D2 a D4 a D5 ) (a D1 a D3 a D4 ) (a D2 a D3 a D4 ) (a D1 a D2 a D3 a D4 ) (a D1 a D3 a D5 ) (a D2 a D3 a D5 ) (a D3 a D4 a D5 ) (a D1 a D5 ), where D i denotes the i-th conjunct of (3.43). The discernibility function has the following prime implicants: a D3 a D5, a D4 a D5, a D1 a D2 a D3, a D1 a D2 a D4, a D1 a D2 a D5, a D1 a D3 a D4. This gives rise to the reducts: {a D3,a D5 }, {a D4,a D5 }, {a D1,a D2,a D3 }, {a D1,a D2,a D4 }, {a D1,a D2,a D5 }, {a D1,a D3,a D4 }. Thus, there are 6 association rules with confidence 1, i.e., 1- irreducible: D 3 D 5 D 1 D 2 D 4 D 4 D 5 D 1 D 2 D 3 D 1 D 2 D 3 D 4 D 5 D 1 D 2 D 4 D 3 D 5 D 1 D 2 D 5 D 3 D 4 D 1 D 3 D 4 D 2 D 5. For confidence 0.9, we look for α-reducts for the decision table A T, where ( ) ( ) 1 α = / Hence, we look for a set of descriptors that covers at least (18 10) α = = 7 elements of the discernibility matrix M (A T ). One can see that the following sets of descriptors: {D 1,D 2 }, {D 1,D 3 }, {D 1,D 4 }, {D 1,D 5 }, {D 2,D 3 }, {D 2,D 5 }, {D 3,D 4 } have nonempty intersections with exactly 7 members of the discernibility matrix M (A T ). Consequently, the 0.9-irreducible association rules obtained from those sets are the following:

58 3 Rough Sets: From Rudiments to Challenges 131 D 1 D 2 D 3 D 4 D 5 D 1 D 3 D 2 D 4 D 5 D 1 D 4 D 2 D 3 D 5 D 1 D 5 D 2 D 3 D 4 D 2 D 3 D 1 D 4 D 5 D 2 D 5 D 1 D 3 D 4 D 3 D 4 D 1 D 2 D 5. The technique illustrated by this example can be applied to find useful dependencies between attributes in complex application domains. In particular, one could use such dependencies in constructing robust classifiers conforming to the laws of the underlying reality. 3.9 Rough sets, Approximate Boolean Reasoning and Scalability Mining large data sets is one of the biggest challenges in KDD. In many practical applications, there is a need of data mining algorithms running on terminals of a client server database system where the only access to database (located in the server) is enabled by SQL queries. Unfortunately, the proposed so far data mining methods based on rough sets and Boolean reasoning approach are characterized by high computational complexity and their straightforward implementations are not applicable for large data sets. The critical factor for time complexity of algorithms solving the discussed problem is the number of simple SQL queries like SELECT COUNT FROM atable WHERE acondition In this section, we present some efficient modifications of these methods to solve out this problem. We consider the following issues: Searching for short reducts from large data sets; Searching for best partitions defined by cuts on continuous attributes; Reduct Calculation Let us again illustrate the idea of reduct calculation using discernibility matrix (Table 3.21). Example Let us consider the weather problem specified by decision system which is represented by decision table (see Table 3.21). Objects are described by four condition attributes and are divided into 2 classes. Let us consider the first 12

59 132 Hung Son Nguyen and Andrzej Skowron observations. In this example, U = {1,2,...,12}, A = {a 1,a 2,a 3,a 4 }, CLASS no = {1,2,6,8}, CLASS yes = {3,4,5,7,9,10,11,12}. Table 3.21: The exemplary weather decision table (left) and the compact form of discernibility matrix (right) date outlook temperature humidity windy play ID a 1 a 2 a 3 a 4 dec 1 sunny hot high FALSE no 2 sunny hot high TRUE no 3 overcast hot high FALSE yes 4 rainy mild high FALSE yes 5 rainy cool normal FALSE yes 6 rainy cool normal TRUE no 7 overcast cool normal TRUE yes 8 sunny mild high FALSE no 9 sunny cool normal FALSE yes 10 rainy mild normal FALSE yes 11 sunny mild normal TRUE yes 12 overcast mild high TRUE yes a 1,a 2 a 4 a 1,a 2,a 3 a,a,a a a,a, M a 1 a 1,a 4 a 1,a 2, a 3, a 4 4 a 1,a 2 a 1,a 2,a 4 a 2,a 3,a 4 a 1 5 a 1,a 2,a 3 a 1,a 2, a 3, a 4 7 a 1,a 2, a 3,a a 3, a 4 9 a 2,a 3 a 2,a 3,a 4 a 1,a 4 a 2,a 3 10 a 1,a 2,a 3 a 1,a 2, a 2,a 4 a 1,a 3 a 3,a 4 11 a 2,a 3,a 4 a 2,a 3 a 1,a 2 a 3,a 4 12 a 1,a 2,a 4 a 1,a 2 a 1,a 2,a 3 a 1,a 4 The discernibility matrix can be treated as a board containing n n boxes. Noteworthy is the fact that discernibility matrix is symmetrical with respect to the main diagonal, because M i, j = M j,i, and that sorting all objects according to their decision classes causes a shift off all empty boxes nearby to the main diagonal. In case of decision table with two decision classes, the discernibility matrix can be rewritten in a more compact form as shown in Table The discernibility function is constructed from discernibility matrix by taking a conjunction of all discernibility clauses in which any attribute a i is substituted by the corresponding Boolean variable x i. After reducing of all repeated clauses we have 29 : f (x 1,x 2,x 3,x 4 ) =(x 1 )(x 1 + x 4 )(x 1 + x 2 )(x 1 + x 2 + x 3 + x 4 )(x 1 + x 2 + x 4 ) (x 2 + x 3 + x 4 )(x 1 + x 2 + x 3 )(x 4 )(x 2 + x 3 )(x 2 + x 4 ) (x 1 + x 3 )(x 3 + x 4 )(x 1 + x 2 + x 4 ). One can find relative reducts of the decision table by searching for prime implicants of this discernibility function. The straightforward method allow us to calculate all prime implicants by transformation of the formula to the DNF form (using absorbtion rule p(p+q) p and other rules for Boolean algebra). One can do it as follows: f = (x 1 )(x 4 )(x 2 + x 3 ) = x 1 x 4 x 2 + x 1 x 4 x 3 Thus we have 2 reducts: R 1 = {a 1,a 2,a 4 } and R 2 = {a 1,a 3,a 4 }. 29 In the formulas + denotes logical disjunction and we omit the conjunction sign if is this not lead to misunderstanding.

60 3 Rough Sets: From Rudiments to Challenges 133 Every heuristic algorithm for the prime implicant problem can be applied to the discernibility function to solve the minimal reduct problem. One of such heuristics was proposed in [295] and was based on the idea of greedy algorithm, where each attribute is evaluated by its discernibility measure, i.e., the number of pairs of objects which are discerned by the attribute, or, equivalently, the number of its occurrences in the discernibility matrix. First we have to calculate the number of occurrences of each attributes in the discernibility matrix: eval(a 1 ) = disc dec (a 1 ) = 23, eval(a 2 ) = disc dec (a 2 ) = 23, eval(a 3 ) = disc dec (a 3 ) = 18, eval(a 4 ) = disc dec (a 4 ) = 16. Thus a 1 and a 2 are the two most preferred attributes. Assume that we select a 1. Now we are taking under consideration only those cells of the discernibility matrix which are not containing a 1. There are 9 such cells only, and the number of occurrences are as the following: eval(a 2 ) = disc dec (a 1,a 2 ) disc dec (a 1 ) = 7, eval(a 3 ) = disc dec (a 1,a 3 ) disc dec (a 1 ) = 7, eval(a 4 ) = disc dec (a 1,a 4 ) disc dec (a 1 ) = 6. If this time we select a 2, then the are only 2 remaining cells, and, both are containing a 4 ; Therefore, the greedy algorithm returns the set {a 1,a 2,a 4 } as a reduct of sufficiently small size. There is another reason for choosing a 1 and a 4, because they are core attributes 30. One can check that an attribute is a core attribute if and only if occurs in the discernibility matrix as a singleton [295]. Therefore, core attributes can be recognized by searching for all singleton cells of the discernibility matrix. The pseudo-code of this algorithm is presented in Algorithm 3.1. The reader may have a feeling that the greedy algorithm for reduct problem has quite a high complexity, because two main operations: disc(b) number of pairs of objects discerned by attributes from B; iscore(a) check whether a is a core attribute; are defined by the discernibility matrix which is a complex data structure containing O(n 2 ) cells, and each cell can contain up to O(m) attributes, where n is the number of objects and m is the number of attributes of the given decision table. This suggests that the two main operations need at least O(mn 2 ) computational time. Fortunately, both operations can be performed more efficiently. It has been shown [178] that both operations can be calculated in time O(mnlogn) without the necessity to store the discernibility matrix. We present an effective implementation of this heuristics that can be applied to large data sets. 30 An attribute is called core attribute if and only if it occurs in every reduct [215, 222].

61 134 Hung Son Nguyen and Andrzej Skowron Algorithm 3.1 Searching for short reduct begin B := // Step 1. Initializing B by core attributes for a A do if iscore(a) then B := B {a} end end // Step 2. Including attributes to B repeat a max := argmax a A B (eval(a max ) > 0) then B := B {a} end until (eval(a max ) == 0) OR (B == A); // Step 3. Elimination for a B do if (disc dec (B) = disc dec (B {a})) then B := B {a}; end end end disc dec (B {a}) eval(a max ) := disc dec (B {a max }) disc dec (B) if Let A = (U,A,dec) be a decision system. By a counting table of a set of objects X U we denote the vector: CountTable(X) = n 1,...,n d, where n k = card(x CLASS k ) is the number of objects from X belonging to the k th decision class. We define a conflict measure of X by con f lict(x) = n i n j = 1 i< j 2 ( d k=1 n k ) 2 d n 2 k k=1 In other words, con f lict(x) is the number of pairs (x,y) X X of objects from different decision classes. By a counting table of a set of attributes B we mean the two-dimensional array Count(B) = [n v,k ] v INF(B),k Vdec, where. n v,k = card({x U : in f B (x) = v and dec(x) = k}). Thus Count(B) is a collection of counting tables of equivalence classes of the indiscernibility relation IND(B). It is clear that the complexity time for the construction of counting table is O(nd logn), where n is the number of objects and d is the num-

62 3 Rough Sets: From Rudiments to Challenges 135 ber of decision classes. One can also observe that counting tables can be easily constructed in data base management systems using simple SQL queries. For a given counting table, one can easily calculate the discernibility measure relative to a set of attributes B by disc dec (B) = 1 n v,k n 2 v,k. v v,k k The disadvantage of this equation relates to the fact that it requires O(S 2 ) operations, where S is the size of the counting table Count(B). The discernibility measure can be understood as a number of unresolved (by the set of attributes B) conflicts. One can show that: (3.44) disc dec (B) = con f lict(u) [x] U/IND(B) con f lict([x] IND(B) ). Thus, the discernibility measure can be determined in O(S) time: ( ) (3.45) disc dec (B) = 1 2 n 2 d n 2 k k=1 1 2 v INF(B) ( d k=1 n v,k ) 2 d n 2 v,k k=1 where n k = CLASS k = v n v,k is the size of k th decision class. Moreover, one can show that attribute a is a core attribute of decision system A = (U,A,dec) if and only if disc dec (A {a}) < disc dec (A). Thus both operations disc dec (B) and iscore(a) can be performed in linear time with respect to the counting table size. Example In the discussed example, the counting table for a 1 is as follows: Count(a 1 ) dec = no dec = yes a 1 = sunny 3 2 a 1 = overcast 0 3 a 1 = rainy 1 3 We illustrate Eqn. (3.45) by inserting some additional columns to the counting table: Count(a 1 ) dec = no dec = yes con f lict(.) a 1 = sunny ( ) = 6 a 1 = overcast ( ) = 0 a 1 = rainy ( ) = 3 U ( ) = 32 Thus disc dec (a 1 ) = = 23.,

63 136 Hung Son Nguyen and Andrzej Skowron Discretization of Large Data Sets Stored in Relational Databases In this section (see [169, 166, 167]), we discuss an application of approximate Boolean reasoning to efficient searching for cuts in large data sets stored in relational databases. Searching for relevant cuts is based on simple statistics which can be efficiently extracted from relational databases. This additional statistical knowledge is making it possible to perform the searching based on Boolean reasoning much more efficient. It can be shown that the extracted cuts by using such reasoning are quite close to optimal. Searching algorithms for optimal partitions of real-valued attributes, defined by cuts, have been intensively studied. The main goal of such algorithms is to discover cuts which can be used to synthesize decision trees or decision rules of high quality wrt some quality measures (e.g., quality of classification of new unseen objects, quality defined by the decision tree height, support and confidence of decision rules). In general, all those problems are hard from computational point of view (e.g., the searching problem for minimal and consistent set of cuts is NP-hard). In consequence, numerous heuristics have been developed for approximate solutions of these problems. These heuristics are based on approximate measures estimating the quality of extracted cuts. Among such measures discernibility measures are relevant for the rough set approach. We outline an approach for solution of a searching problem for optimal partition of real-valued attributes by cuts, assuming that the large data table is represented in a relational database. In such a case, even the linear time complexity with respect to the number of cuts is not acceptable because of the time needed for one step. The critical factor for time complexity of algorithms solving that problem is the number of SQL queries of the form SELECT COUNT FROM a Table WHERE (an attribute BETWEEN value1 AND value2) AND (additional condition) necessary to construct partitions of real-valued attribute sets. We assume the answer time for such queries does not depend on the interval length 31. Using a straightforward approach to optimal partition selection (wrt a given measure), the number of necessary queries is of order O(N), where N is the number of preassumed cuts. By introducing some optimization measures, it is possible to reduce the size of searching space. Moreover, using only O(log N) simple queries, suffices to construct a partition very close to optimal. Let A = (U,A,d) be a decision system with real-valued condition attributes. Any cut (a,c), where a A and c is a real number, defines two disjoint sets given by 31 This assumption is satisfied in some existing database management systems.

64 3 Rough Sets: From Rudiments to Challenges 137 U L (a,c) = {x U : a(x) c}, U R (a,c) = {x U : a(x) > c}. If both U L (a,c) and U R (a,c) are non-empty, then c is called a a. The cut (a,c) discerns a pair of objects x, y if either a(x) < c a(y) or a(y) < c a(x). Let A = (U,A,d) be a decision system with real-valued condition attributes and decision classes X i, for i = 1,...,r(d). A quality of a cut (a,c), denoted by W(a,c), is defined by (3.46) W(a,c) = = r(d) i j ( r(d) L i (a,c) R j (a,c) L i (a,c) i=1 ) ( r(d) R i (a,c) i=1 ) r(d) i=1 L i (a,c) R i (a,c), where L i (a,c) = card(x i U L (a,c)) and R i (a,c) = card(x i U R (a,c)), for i = 1,...,r(d). In the sequel, we will be interested in finding cuts maximizing the function W(a,c). The following definition will be useful. Let C a = {(a,c 1 ),...,(a,c N )} be a set of cuts on attribute a, over a decision system A and assume c 1 < c 2... < c N. By a median of the i th decision class, denoted by Median(i), we mean the minimal index j for which the cut (a,c j ) C a minimizes the value L i (a,c j ) R i (a,c j ), 32 where L i and R i are defined before. One can use only O(r(d) logn) SQL queries to determine the medians of decision classes by using the well-known binary search algorithm. Then one can show that the quality function W a (i) def = W(a,c i ), for i = 1,...,N, is increasing in {1,...,min} and decreasing in {max,...,n}, where min and max are defined by min = min Median(i), 1 i N max = max Median(i). 1 i N In consequence, the search space for maximum of W(a,c i ) is reduced to i [min,max]. Now, one can apply the divide and conquer strategy to determine the best cut, given by c Best [c min,c max ], wrt the chosen quality function. First, we divide the interval containing all possible cuts into k intervals. Using some heuristics, one then predict the interval which most probably contains the best cut. This process is recursively applied to that interval, until the considered interval consists of one cut. The problem which remains to be solved is how to define such approximate measures which could help us to predict the suitable interval. 32 The minimization means that L i (a,c j ) R i (a,c j ) = min 1 k N L i(a,c k ) R i (a,c k ).

65 138 Hung Son Nguyen and Andrzej Skowron Let us consider a simple probabilistic model. Let (a,c L ), (a,c R ) be two cuts such that c L < c R and i = 1,...,r(d). For any cut (a,c) satisfying c L < c < c R, we assume that x 1,...,x r(d), where x i = card(x i U L (a,c) U R (a,c)) are independent random variables with uniform distribution over sets {0,...,M 1 },..., {0,...,M r(d) }, respectively, that M i = M i (a,c L,c R ) = card(x i U L (a,c R ) U R (a,c L )). Under these assumptions the following fact holds. For any cut c [c L,c R ], the mean E(W(a,c)) of quality W(a,c), is given by (3.47) E(W(a,c)) = W(a,c L) +W(a,c R ) + con f lict((a,c L ),(a,c R )), 2 where con f lict((a,c L ),(a,c R )) = M i M j. i j In addition, the standard deviation of W(a,c) is given by (3.48) D 2 (W(a,c)) = n i=1 M i(m i + 2) 12 ( (R j (a,c R ) L j (a,c L )) j i ) 2 Formulas (3.47) and (3.48) can be used to construct a predicting measure for the quality of the interval [c L,c R ]: (3.49) Eval ([c L,c R ],α) = E(W(a,c)) + α D 2 (W(a,c)), where the real parameter α [0,1] can be tuned in a learning process. To determine the value Eval ([c L,c R ],α), we need to compute the numbers L 1 (a,c L ),...,L r(d) (a,c L ),M 1,...,M r(d),r 1 (a,c R ),...,R r(d) (a,c R ). This requires O(r(d)) SQL queries of the form SELECT COUNT FROM DecTable WHERE (attribute a BETWEEN value1 AND value2) AND (dec = i). Hence, the number of queries required for running this algorithm is O(r(d)k log k N). In practice, we set k = 3, since the function f (k) = r(d)k log k N over positive integers is taking minimum for k = 3. Numerous experiments on different data sets have shown that the proposed solution allows one to find a cut which is very close to the optimal one. For more details the reader is referred to the literature (see [166, 167])..

66 3 Rough Sets: From Rudiments to Challenges Rough Sets and Logic The father of contemporary logic is a German mathematician Gottlob Frege ( ). He thought that mathematics should not be based on the notion of set but on the notions of logic. He created the first axiomatized logical system but it was not understood by the logicians of those days. During the first three decades of the 20th century, there was a rapid development in logic bolstered to a great extent by Polish logicians, especially Alfred Tarski ( ) (see,e.g., [351]). Development of computers and their applications stimulated logical research and widened their scope. When we speak about logic, we generally mean deductive logic. It gives us tools designed for deriving true propositions from other true propositions. Deductive reasoning always leads to true conclusions. The theory of deduction has well established generally accepted theoretical foundations. Deductive reasoning is the main tool used in mathematical reasoning and found no application beyond it. Rough set theory has contributed to some extent to various kinds of deductive reasoning. Particularly, various kinds of logics based on the rough set approach have been investigated, rough set methodology contributed essentially to modal logics, many valued logic, intuitionistic logic and others (see, e.g., [6, 7, 53, 54, 57, 70, 69, 146, 147, 161, 160, 162, 185, 187, 188, 190, 213, 214, 246, 247, 259, 260, 261, 262, 263, 264, 265, 360, 359, 361, 362]). A summary of this research can be found in [245, 38] and interested reader is advised to consult these volumes. In natural sciences (e.g., in physics) inductive reasoning is of primary importance. The characteristic feature of such reasoning is that it does not begin from axioms (expressing general knowledge about the reality) like in deductive logic, but some partial knowledge (examples) about the universe of interest are the starting point of this type of reasoning, which are generalized next and they constitute the knowledge about wider reality than the initial one. In contrast to deductive reasoning, inductive reasoning does not lead to true conclusions but only to probable (possible) ones. Also in contrast to the logic of deduction, the logic of induction does not have uniform, generally accepted, theoretical foundations as yet, although many important and interesting results have been obtained, e.g., concerning statistical and computational learning and others. Verification of validity of hypotheses in the logic of induction is based on experiment rather than the formal reasoning of the logic of deduction. Physics is the best illustration of this fact. The research on inductive logic has a few centuries long history and outstanding English philosopher John Stuart Mill ( ) is considered its father [150]. The creation of computers and their innovative applications essentially contributed to the rapid growth of interest in inductive reasoning. This domain develops very dynamically thanks to computer science. Machine learning, knowledge discovery, reasoning from data, expert systems and others are examples of new directions in inductive reasoning. It seems that rough set theory is very well suited as a the-

67 140 Hung Son Nguyen and Andrzej Skowron oretical basis for inductive reasoning. Basic concepts of this theory fit very well to represent and analyze knowledge acquired from examples, which can be next used as starting point for generalization. Besides, in fact rough set theory has been successfully applied in many domains to find patterns in data (data mining) and acquire knowledge from examples (learning from examples). Thus, rough set theory seems to be another candidate as a mathematical foundation of inductive reasoning [24, 177, 308]. The most interesting from computer science point of view is common sense reasoning. We use this kind of reasoning in our everyday life, and examples of such kind of reasoning we face in news papers, radio TV etc., in political, economic etc., debates and discussions. The starting point to such reasoning is the knowledge possessed by the specific group of people (common knowledge) concerning some subject and intuitive methods of deriving conclusions from it. We do not have here possibilities of resolving the dispute by means of methods given by deductive logic (reasoning) or by inductive logic (experiment). So the best known methods of solving the dilemma is voting, negotiations or even war. See e.g., Gulliver s Travels [341], where the hatred between Tramecksan (High-Heels) and Slamecksan (Low-Heels) or disputes between Big-Endians and Small-Endians could not be resolved without a war. These methods do not reveal the truth or falsity of the thesis under consideration at all. Of course, such methods are not acceptable in mathematics or physics. Nobody is going to solve by voting, negotiations or declare a war the truth of Fermat s theorem or Newton s laws. Reasoning of this kind is the least studied from the theoretical point of view and its structure is not sufficiently understood, in spite of many interesting theoretical research in this domain [62]. The meaning of common sense reasoning, considering its scope and significance for some domains, is fundamental and rough set theory can also play an important role in it but more fundamental research must be done to this end [294]. In particular, the rough truth introduced in [213] and studied, e.g., in [7] seems to be important for investigating commonsense reasoning in the rough set framework. Let us consider a simple example. In the considered decision system we assume U = Birds is a set of birds that are described by some condition attributes from a set A. The decision attribute is a binary attribute Files with possible values yes if the given bird flies and no, otherwise. Then, we define (relative to an information system A = (U,A)) the set of abnormal birds by Ab A (Birds) = LOW A ({x Birds : Flies(x) = no}). Hence, we have, Ab A (Birds) = Birds UPP A ({x Birds : Flies(x) = yes}) and Birds Ab A (Birds) = UPP A ({x Birds : Flies(x) = yes}). It means that for normal birds it is consistent, with knowledge represented by A, to assume that they can fly, i.e., it is possible that they can fly. One can optimize Ab A (Birds) using A to obtain minimal boundary region in the approximation of {x Birds : Flies(x) = no}. It is worthwhile to mention that in [48] has been presented an approach combining the rough sets with nonmonotonic reasoning. There are distinguished some basic concepts that can be approximated on the basis of sensor measurements and

68 3 Rough Sets: From Rudiments to Challenges 141 more complex concepts that are approximated using so called transducers defined by first order theories constructed overs approximated concepts. Another approach to commonsense reasoning has been developed in a number of papers (see, e.g., [24, 177, 199, 255, 294]). The approach is based on an ontological framework for approximation. In this approach approximations are constructed for concepts and dependencies between the concepts represented in a given ontology expressed, e.g., in natural language. Still another approach combining rough sets with logic programming is discussed in [365]. To recapitulate, the characteristics of the three above mentioned kinds of reasoning are given below: 1. deductive: reasoning method: axioms and rules of inference; applications: mathematics; theoretical foundations: complete theory; conclusions: true conclusions from true premisses; hypotheses verification: formal proof. 2. inductive: reasoning method: generalization from examples; applications: natural sciences (physics); theoretical foundation: lack of generally accepted theory; conclusions: not true but probable (possible); hypotheses verification - experiment. 3. common sense: reasoning method based on common sense knowledge with intuitive rules of inference expressed in natural language; applications: every day life, humanities; theoretical foundation: lack of generally accepted theory; conclusions obtained by mixture of deductive and inductive reasoning based on concepts expressed in natural language, e.g., with application of different inductive strategies for conflict resolution (such as voting, negotiations, cooperation, war) based on human behavioral patterns; hypotheses verification - human behavior. There are numerous issues related to approximate reasoning under uncertainty. These issues are discussed in books on granular computing, rough mereology, and computational complexity of algorithmic problems related to these issues. For more detail, the reader is referred to the following books [223, 156, 45, 159, 248]. Finally, we would like to stress that still much more work should be done to develop approximate reasoning methods for making progress in development intelligent systems. This idea was very well expressed by Professor Leslie Valiant 33 : 33

69 142 Hung Son Nguyen and Andrzej Skowron A fundamental question for artificial intelligence is to characterize the computational building blocks that are necessary for cognition. A specific challenge is to build on the success of machine learning so as to cover broader issues in intelligence... This requires, in particular a reconciliation between two contradictory characteristics the apparent logical nature of reasoning and the statistical nature of learning Interactive Rough Granular Computing (IRGC) There are many real-life problems that are still hard to solve using the existing methodologies and technologies. Among such problems are, e.g., classification and understanding of medical images, control of autonomous systems like unmanned aerial vehicles or robots, and problems related to monitoring or rescue tasks in multiagent systems. All of these problems are closely related to intelligent systems that are more and more widely applied in different real-life projects. One of the main challenges in developing intelligent systems is discovering methods for approximate reasoning from measurements to perception, i.e., deriving from concepts resulting from sensor measurements concepts or expressions enunciated in natural language that express perception understanding. Nowadays, new emerging computing paradigms are investigated attempting to make progress in solving problems related to this challenge. Further progress depends on a successful cooperation of specialists from different scientific disciplines such as mathematics, computer science, artificial intelligence, biology, physics, chemistry, bioinformatics, medicine, neuroscience, linguistics, psychology, sociology. In particular, different aspects of reasoning from measurements to perception are investigated in psychology [11, 95], neuroscience [244], layered learning [332], mathematics of learning [244], machine learning, pattern recognition [97], data mining [115] and also by researchers working on recently emerged computing paradigms such as computing with words and perception [386], granular computing [199], rough sets, rough-mereology, and rough-neural computing [199]. One of the main problems investigated in machine learning, pattern recognition [97] and data mining [115] is concept approximation. It is necessary to induce approximations of concepts (models of concepts) from available experimental data. The data models developed so far in such areas like statistical learning, machine learning, pattern recognition are not satisfactory for approximation of compound concepts resulting in the perception process. Researchers from the different areas have recognized the necessity to work on new methods for concept approximation (see, e.g., [35, 364]). The main reason is that these compound concepts are, in a sense, too far from measurements which makes the searching for relevant (for their approximation) features infeasible in a huge space. There are several research directions aiming at overcoming this difficulty. One of them is based on the interdisciplinary research where the results concerning perception in psychology or neuroscience are used to help to deal with compound concepts (see, e.g., [97]). There is a great effort in neuroscience towards understanding the hierarchical structures of neural networks in living organisms [56, 244]. Convolutional networks (Con-

70 3 Rough Sets: From Rudiments to Challenges 143 vnets) which are a biologically inspired trainable architecture that can learn invariant features, were developed (see, e.g., [352]). Also mathematicians are recognizing problems of learning as the main problem of the current century [244]. The problems discussed so far are also closely related to complex system modeling. In such systems again the problem of concept approximation and reasoning about perceptions using concept approximations is one of the challenges nowadays. One should take into account that modeling complex phenomena entails the use of local models (captured by local agents, if one would like to use the multi-agent terminology [103, 387]) that next should be fused [354]. This process involves the negotiations between agents [103] to resolve contradictions and conflicts in local modeling. This kind of modeling will become more and more important in solving complex reallife problems which we are unable to model using traditional analytical approaches. The latter approaches lead to exact models. However, the necessary assumptions used to develop them are causing the resulting solutions to be too far from reality to be accepted. New methods or even a new science should be developed for such modeling [66]. One of the possible solutions in searching for methods for compound concept approximations is the layered learning idea [332]. Inducing concept approximation should be developed hierarchically starting from concepts close to sensor measurements to compound target concepts related to perception. This general idea can be realized using additional domain knowledge represented in natural language. For example, one can use principles of behavior on the roads, expressed in natural language, trying to estimate, from recordings (made, e.g., by camera and other sensors) of situations on the road, if the current situation on the road is safe or not. To solve such a problem one should develop methods for concept approximations together with methods aiming at approximation of reasoning schemes (over such concepts) expressed in natural language. Foundations of such an approach are based on rough set theory [215] and its extension rough mereology [199, 249, 250, 252, 13, 248], both discovered in Poland. Objects we are dealing with are information granules. Such granules are obtained as the result of information granulation [386]: Information granulation can be viewed as a human way of achieving data compression and it plays a key role in implementation of the strategy of divide-and-conquer in human problem-solving. Constructions of information granules should be robust with respect to their input information granule deviations. In this way also a granulation of information granule constructions is considered. As the result we obtain the so called AR schemes (AR networks) [199, 249, 250, 252]. AR schemes can be interpreted as complex patterns [115]. Searching methods for such patterns relevant for a given target concept have been developed [199, 13]. Methods for deriving relevant AR schemes are of high computational complexity. The complexity can be substantially reduced by using domain knowledge. In such a case AR schemes are derived along reasoning schemes in natural language that are retrieved from domain knowledge. Developing methods for deriving such AR schemes is one of the main goals of our projects.

71 144 Hung Son Nguyen and Andrzej Skowron Granulation is a computing paradigm, among others like self-reproduction, selforganization, functioning of brain, Darwinian evolution, group behavior, cell membranes, and morphogenesis, that are abstracted from natural phenomena. Granulation is inherent in human thinking and reasoning processes. Granular computing (GrC) provides an information processing framework where computation and operations are performed on information granules, and it is based on the realization that precision is sometimes expensive and not much meaningful in modeling and controlling complex systems. When a problem involves incomplete, uncertain, and vague information, it may be difficult to differentiate distinct elements and one may find it convenient to consider granules for its handling. The structure of granulation can be often defined using methods based on rough sets, fuzzy sets or their combination. In this consortium, rough sets and fuzzy sets work synergistically, often with other soft computing approaches, and use the principle of granular computing. The developed systems exploit the tolerance for imprecision, uncertainty, approximate reasoning and partial truth under soft computing framework and is capable of achieving tractability, robustness, and close resemblance with human like (natural) decision making for pattern recognition in ambiguous situations [292]. Qualitative reasoning requires to develop methods supporting approximate reasoning under uncertainty about non-crisp concepts, often vague concepts. One of the very general scheme of tasks for such qualitative reasoning can be described as follows. From some basic objects (called in different areas as patterns, granules or molecules) it is required to construct (induce) complex objects satisfying a given specification (often, expressed in natural language specification) to a satisfactory degree. For example, in learning concepts from examples we deal with tasks where a partial information about the specification is given by examples and counter examples concerning of classified objects. As examples of such complex objects one can consider classifiers considered in Machine Learning or Data Mining, new medicine against some viruses or behavioral patterns of cell interaction induced from interaction of biochemical processes realized in cells. Over the years we have learned how to solve some of such tasks while many of them are still challenges. One of the reasons is that the discovery process of complex objects relevant for the given specification requires multilevel reasoning with necessity of discovering on each level the relevant structural objects and their properties. The searching space for such structural objects and properties is very huge and this, in particular, causes that fully automatic methods are not feasible using the exiting computing technologies. However, this process can be supported by domain knowledge used which can be used for generating hints in the searching process (see, e.g., [13]). This view is consistent with [34] (see, page 3 of Foreword): [...] Tomorrow, I believe, every biologist will use computer to define their research strategy and specific aims, manage their experiments, collect their results, interpret their data, incorporate the findings of others, disseminate their observations, and extend their experimental observations - through exploratory discovery and modeling - in directions completely unanticipated. Rough sets, discovered by Zdzisław Pawlak [212], and fuzzy sets, due to Lotfi Zadeh [385], separately and in combination have shown quite strong potential for sup-

72 3 Rough Sets: From Rudiments to Challenges 145 porting the searching process for the relevant complex objects (granules) discussed above (see, e.g., [200, 199, 220, 13, 192]). Fuzzy set theory addresses gradualness of knowledge, expressed by the fuzzy membership, whereas rough set theory addresses granularity of knowledge, expressed by the indiscernibility relation. Computations on granules should be interactive. This requirement is fundamental for modeling of complex systems [67]. For example, in [183] this is expressed by the following sentence: [...] interaction is a critical issue in the understanding of complex systems of any sorts: as such, it has emerged in several well-established scientific areas other than computer science, like biology, physics, social and organizational sciences. Interactive Rough Granular Computing (IRGC) is an approach for modeling interactive computations (see, e.g., [312]). IRGC are progressing by interactions between granules (structural objects of quite often high order type) discovered from data and domain knowledge. In particular, interactive information systems (IIS) are dynamic granules used for representing the results of the agent interaction with the environments. IIS can be also applied in modeling more advanced forms of interactions such as hierarchical interactions in layered granular networks or generally in hierarchical modeling. The proposed approach [312, 313] is based on rough sets but it can be combined with other soft computing paradigms such as fuzzy sets or evolutionary computing, and also with machine learning and data mining techniques. The notion of the highly interactive granular system is clarified as the system in which intrastep interactions with the external as well as with the internal environments take place. Two kinds of interactive attributes are distinguished: perception attributes, including sensory ones and action attributes. The outlined research directions in this section create a step toward understanding the nature of reasoning from measurements to perception. These foundations are crucial for constructing intelligent systems for many real-life projects. The recent progress in this direction based on rough sets and granular computing is reported in [312, 313]. In the following section, we outline three important challenging topics Context Inducing and IRGC Reasoning about context belongs to the main problems investigated in AI for many years (see, e.g., [148, 273, 327]). One of the old and still challenging problem in machine learning, pattern recognition and data mining is feature discovery (feature construction, feature extraction) [97]. This problem is related to discovery of structures of objects or contexts in which analyzed objects should be considered. In this section, we discuss an application of information systems for context modeling. The approach is based on fusion of information systems (or decision systems) with constraints. The constraints can be defined by means of relations over sets of attribute values or their Cartesian products. Objects on the next level of modeling are

73 146 Hung Son Nguyen and Andrzej Skowron relational structures over signatures (or sets of signatures) of arguments of fusion operation. In this way, one can obtain as objects on higher level of modeling indiscernibility (similarity) classes of objects, time windows, their clusters, sequences of time windows and their sets. Indiscernibility classes over objects representing sequences of time windows are sets of such sequences and they may represent information about processes. Let us consider one simple example illustrating this approach elaborated, e.g., in [311, 312, 313]. In the process of searching for (sub-)optimal approximation spaces, different strategies may be used. Let us consider an example of such strategy presented in [309]. In this example, DT = (U,A,d) denotes a decision system (a given sample of data), where U is a set of objects, A is a set of attributes and d is a decision. We assume that for any object x U, only partial information equal to the A-signature of x (object signature, for short) is accessible, i.e., In f A (x) = {(a,a(x)) : a A}. Analogously, for any concept there are only given a partial information about this concept by means of a sample of objects, e.g., in the form of decision table. One can use object signatures as new objects in a new relational structure R. In this relational structure R some relations between object signatures are also modeled, e.g., defined by the similarities of these object signatures (see Figure 3.4). α a x y v1 w1 v r w Fig. 3.4: Granulation to tolerance classes. r is a similarity (tolerance) relation defined over signatures of objects. Discovery of relevant relations between object signatures is an important step in searching for relevant approximation spaces. In this way, a class of relational structures representing perception of objects and their parts is constructed. In the next step, we select a language L consisting of formulas expressing properties over the defined relational structures and we search for relevant formulas in L. The semantics of formulas (e.g., with one free variable) from L are subsets of object signatures. Note, that each object signature defines a neighborhood of objects from a given sample (e.g., decision system DT ) and another set on the whole universe of objects being an extension of U. Thus, each formula from L defines a family of sets of objects over the sample and also another family of sets over the universe of all objects. Such

74 3 Rough Sets: From Rudiments to Challenges 147 families can be used to define new neighborhoods for a new approximation space by, e.g., taking their unions. In the process of searching for relevant neighborhoods, we use information encoded in the available sample. More relevant neighborhoods make it possible to define more relevant approximation spaces (from the point of view of the optimization criterion). Following this scheme, the next level of granulation may be related to clusters of objects (relational structures) for a current level (see Figure 3.5). β a x v 1 y w 1 τ(v) rε,δ τ(w) Fig. 3.5: Granulation of tolerance relational structures to clusters of such structures. r ε,δ is a relation with parameters ε,δ on similarity (tolerance) classes. In Figure 3.5 τ denotes a similarity (tolerance) relation on vectors of attribute values, τ(v) = {u : v τ u}, τ(v) r ε,δ τ(w) iff dist(τ(v),τ(w)) [ε δ,ε + δ], and dist(τ(v),τ(w)) = in f {dist(v,w ) : (v,w ) τ(v) τ(w)} where dist is a distance function on vectors of attribute values. One more example is illustrated in Figure 3.6, where the next level of hierarchical modeling is created by defining an information system in which objects are time windows and attributes are (time-related) properties of these windows. t t T a 1 α x i rem(i,t) v 1i v 1 v j j 1 j v T T Fig. 3.6: Granulation of time points into time windows. T is the time window length, v j = (v 1 j,...,v T j ) for j = 1,...,T, rem(i,t ) is the remainder from division of i by T, α is an attribute defined over time windows.

75 148 Hung Son Nguyen and Andrzej Skowron It is worth mentioning that quite often this searching process is even more sophisticated. For example, one can discover several relational structures (e.g., corresponding to different attributes) and formulas over such structures defining different families of neighborhoods from the original approximation space. As a next step, such families of neighborhoods can be merged into neighborhoods in a new, higher degree approximation space. The proposed approach is making it possible to construct information systems (or decision system) on a given level of hierarchical modeling from information systems from lower level(s) by using some constraints in joining objects from underlying information systems. In this way, structural objects can be modeled and their properties can be expressed in constructed information systems by selecting relevant attributes. These attributes are defined with use of a language that makes use of attributes of systems from the lower hierarchical level as well as relations used to define constraints. In some sense, the objects on the next level of hierarchical modeling are defined using the syntax from the lover level of the hierarchy. Domain knowledge is used to aid the discovery of relevant attributes (features) on each level of hierarchy. This domain knowledge can be provided, e.g., by concept ontology together with samples of objects illustrating concepts from this ontology. Such knowledge is making it feasible to search for relevant attributes (features) on different levels of hierarchical modeling. In Figure 3.7 we symbolically illustrate the transfer of knowledge in a particular application. It is a depiction of how the knowledge about outliers in handwritten digit recognition is transferred from expert to a software system. We call this process knowledge elicitation [179, 180, 181]. Observe, that the explanations given by expert(s) are expressed using a subset of natural language limited by using concepts from provided ontology only. Concepts from higher levels of ontology are gradually approximated by the system from concepts on lower levels. Fig. 3.7: Expert s knowledge elicitation.

Rough Sets and Conflict Analysis

Rough Sets and Conflict Analysis Rough Sets and Conflict Analysis Zdzis law Pawlak and Andrzej Skowron 1 Institute of Mathematics, Warsaw University Banacha 2, 02-097 Warsaw, Poland skowron@mimuw.edu.pl Commemorating the life and work

More information

A new Approach to Drawing Conclusions from Data A Rough Set Perspective

A new Approach to Drawing Conclusions from Data A Rough Set Perspective Motto: Let the data speak for themselves R.A. Fisher A new Approach to Drawing Conclusions from Data A Rough et Perspective Zdzisław Pawlak Institute for Theoretical and Applied Informatics Polish Academy

More information

Banacha Warszawa Poland s:

Banacha Warszawa Poland  s: Chapter 12 Rough Sets and Rough Logic: A KDD Perspective Zdzis law Pawlak 1, Lech Polkowski 2, and Andrzej Skowron 3 1 Institute of Theoretical and Applied Informatics Polish Academy of Sciences Ba ltycka

More information

ORTHODOX AND NON-ORTHODOX SETS - SOME PHILOSOPHICAL REMARKS

ORTHODOX AND NON-ORTHODOX SETS - SOME PHILOSOPHICAL REMARKS FOUNDATIONS OF COMPUTING AND DECISION SCIENCES Vol. 30 (2005) No. 2 ORTHODOX AND NON-ORTHODOX SETS - SOME PHILOSOPHICAL REMARKS Zdzisław PAWLAK* Abstract. We outline the relationship between classical

More information

Computational Intelligence, Volume, Number, VAGUENES AND UNCERTAINTY: A ROUGH SET PERSPECTIVE. Zdzislaw Pawlak

Computational Intelligence, Volume, Number, VAGUENES AND UNCERTAINTY: A ROUGH SET PERSPECTIVE. Zdzislaw Pawlak Computational Intelligence, Volume, Number, VAGUENES AND UNCERTAINTY: A ROUGH SET PERSPECTIVE Zdzislaw Pawlak Institute of Computer Science, Warsaw Technical University, ul. Nowowiejska 15/19,00 665 Warsaw,

More information

A PRIMER ON ROUGH SETS:

A PRIMER ON ROUGH SETS: A PRIMER ON ROUGH SETS: A NEW APPROACH TO DRAWING CONCLUSIONS FROM DATA Zdzisław Pawlak ABSTRACT Rough set theory is a new mathematical approach to vague and uncertain data analysis. This Article explains

More information

Index. C, system, 8 Cech distance, 549

Index. C, system, 8 Cech distance, 549 Index PF(A), 391 α-lower approximation, 340 α-lower bound, 339 α-reduct, 109 α-upper approximation, 340 α-upper bound, 339 δ-neighborhood consistent, 291 ε-approach nearness, 558 C, 443-2 system, 8 Cech

More information

Rough Sets, Rough Relations and Rough Functions. Zdzislaw Pawlak. Warsaw University of Technology. ul. Nowowiejska 15/19, Warsaw, Poland.

Rough Sets, Rough Relations and Rough Functions. Zdzislaw Pawlak. Warsaw University of Technology. ul. Nowowiejska 15/19, Warsaw, Poland. Rough Sets, Rough Relations and Rough Functions Zdzislaw Pawlak Institute of Computer Science Warsaw University of Technology ul. Nowowiejska 15/19, 00 665 Warsaw, Poland and Institute of Theoretical and

More information

Research Article Special Approach to Near Set Theory

Research Article Special Approach to Near Set Theory Mathematical Problems in Engineering Volume 2011, Article ID 168501, 10 pages doi:10.1155/2011/168501 Research Article Special Approach to Near Set Theory M. E. Abd El-Monsef, 1 H. M. Abu-Donia, 2 and

More information

Some remarks on conflict analysis

Some remarks on conflict analysis European Journal of Operational Research 166 (2005) 649 654 www.elsevier.com/locate/dsw Some remarks on conflict analysis Zdzisław Pawlak Warsaw School of Information Technology, ul. Newelska 6, 01 447

More information

Fuzzy Systems. Introduction

Fuzzy Systems. Introduction Fuzzy Systems Introduction Prof. Dr. Rudolf Kruse Christian Moewes {kruse,cmoewes}@iws.cs.uni-magdeburg.de Otto-von-Guericke University of Magdeburg Faculty of Computer Science Department of Knowledge

More information

Drawing Conclusions from Data The Rough Set Way

Drawing Conclusions from Data The Rough Set Way Drawing Conclusions from Data The Rough et Way Zdzisław Pawlak Institute of Theoretical and Applied Informatics, Polish Academy of ciences, ul Bałtycka 5, 44 000 Gliwice, Poland In the rough set theory

More information

Fuzzy Systems. Introduction

Fuzzy Systems. Introduction Fuzzy Systems Introduction Prof. Dr. Rudolf Kruse Christoph Doell {kruse,doell}@iws.cs.uni-magdeburg.de Otto-von-Guericke University of Magdeburg Faculty of Computer Science Department of Knowledge Processing

More information

Feature Selection with Fuzzy Decision Reducts

Feature Selection with Fuzzy Decision Reducts Feature Selection with Fuzzy Decision Reducts Chris Cornelis 1, Germán Hurtado Martín 1,2, Richard Jensen 3, and Dominik Ślȩzak4 1 Dept. of Mathematics and Computer Science, Ghent University, Gent, Belgium

More information

Approximate Boolean Reasoning: Foundations and Applications in Data Mining

Approximate Boolean Reasoning: Foundations and Applications in Data Mining Approximate Boolean Reasoning: Foundations and Applications in Data Mining Hung Son Nguyen Institute of Mathematics, Warsaw University Banacha 2, 02-097 Warsaw, Poland son@mimuw.edu.pl Table of Contents

More information

Data Analysis - the Rough Sets Perspective

Data Analysis - the Rough Sets Perspective Data Analysis - the Rough ets Perspective Zdzisław Pawlak Institute of Computer cience Warsaw University of Technology 00-665 Warsaw, Nowowiejska 15/19 Abstract: Rough set theory is a new mathematical

More information

Interpreting Low and High Order Rules: A Granular Computing Approach

Interpreting Low and High Order Rules: A Granular Computing Approach Interpreting Low and High Order Rules: A Granular Computing Approach Yiyu Yao, Bing Zhou and Yaohua Chen Department of Computer Science, University of Regina Regina, Saskatchewan, Canada S4S 0A2 E-mail:

More information

On rule acquisition in incomplete multi-scale decision tables

On rule acquisition in incomplete multi-scale decision tables *Manuscript (including abstract) Click here to view linked References On rule acquisition in incomplete multi-scale decision tables Wei-Zhi Wu a,b,, Yuhua Qian c, Tong-Jun Li a,b, Shen-Ming Gu a,b a School

More information

Similarity-based Classification with Dominance-based Decision Rules

Similarity-based Classification with Dominance-based Decision Rules Similarity-based Classification with Dominance-based Decision Rules Marcin Szeląg, Salvatore Greco 2,3, Roman Słowiński,4 Institute of Computing Science, Poznań University of Technology, 60-965 Poznań,

More information

Fuzzy and Rough Sets Part I

Fuzzy and Rough Sets Part I Fuzzy and Rough Sets Part I Decision Systems Group Brigham and Women s Hospital, Harvard Medical School Harvard-MIT Division of Health Sciences and Technology Aim Present aspects of fuzzy and rough sets.

More information

Granularity, Multi-valued Logic, Bayes Theorem and Rough Sets

Granularity, Multi-valued Logic, Bayes Theorem and Rough Sets Granularity, Multi-valued Logic, Bayes Theorem and Rough Sets Zdzis law Pawlak Institute for Theoretical and Applied Informatics Polish Academy of Sciences ul. Ba ltycka 5, 44 000 Gliwice, Poland e-mail:zpw@ii.pw.edu.pl

More information

ROUGH set methodology has been witnessed great success

ROUGH set methodology has been witnessed great success IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 14, NO. 2, APRIL 2006 191 Fuzzy Probabilistic Approximation Spaces and Their Information Measures Qinghua Hu, Daren Yu, Zongxia Xie, and Jinfu Liu Abstract Rough

More information

On the Relation of Probability, Fuzziness, Rough and Evidence Theory

On the Relation of Probability, Fuzziness, Rough and Evidence Theory On the Relation of Probability, Fuzziness, Rough and Evidence Theory Rolly Intan Petra Christian University Department of Informatics Engineering Surabaya, Indonesia rintan@petra.ac.id Abstract. Since

More information

Sets with Partial Memberships A Rough Set View of Fuzzy Sets

Sets with Partial Memberships A Rough Set View of Fuzzy Sets Sets with Partial Memberships A Rough Set View of Fuzzy Sets T. Y. Lin Department of Mathematics and Computer Science San Jose State University San Jose, California 95192-0103 E-mail: tylin@cs.sjsu.edu

More information

Classification of Voice Signals through Mining Unique Episodes in Temporal Information Systems: A Rough Set Approach

Classification of Voice Signals through Mining Unique Episodes in Temporal Information Systems: A Rough Set Approach Classification of Voice Signals through Mining Unique Episodes in Temporal Information Systems: A Rough Set Approach Krzysztof Pancerz, Wies law Paja, Mariusz Wrzesień, and Jan Warcho l 1 University of

More information

Naive Bayesian Rough Sets

Naive Bayesian Rough Sets Naive Bayesian Rough Sets Yiyu Yao and Bing Zhou Department of Computer Science, University of Regina Regina, Saskatchewan, Canada S4S 0A2 {yyao,zhou200b}@cs.uregina.ca Abstract. A naive Bayesian classifier

More information

Financial Informatics IX: Fuzzy Sets

Financial Informatics IX: Fuzzy Sets Financial Informatics IX: Fuzzy Sets Khurshid Ahmad, Professor of Computer Science, Department of Computer Science Trinity College, Dublin-2, IRELAND November 19th, 2008 https://www.cs.tcd.ie/khurshid.ahmad/teaching.html

More information

ON SOME PROPERTIES OF ROUGH APPROXIMATIONS OF SUBRINGS VIA COSETS

ON SOME PROPERTIES OF ROUGH APPROXIMATIONS OF SUBRINGS VIA COSETS ITALIAN JOURNAL OF PURE AND APPLIED MATHEMATICS N. 39 2018 (120 127) 120 ON SOME PROPERTIES OF ROUGH APPROXIMATIONS OF SUBRINGS VIA COSETS Madhavi Reddy Research Scholar, JNIAS Budhabhavan, Hyderabad-500085

More information

Uncertain Logic with Multiple Predicates

Uncertain Logic with Multiple Predicates Uncertain Logic with Multiple Predicates Kai Yao, Zixiong Peng Uncertainty Theory Laboratory, Department of Mathematical Sciences Tsinghua University, Beijing 100084, China yaok09@mails.tsinghua.edu.cn,

More information

Rough Set Model Selection for Practical Decision Making

Rough Set Model Selection for Practical Decision Making Rough Set Model Selection for Practical Decision Making Joseph P. Herbert JingTao Yao Department of Computer Science University of Regina Regina, Saskatchewan, Canada, S4S 0A2 {herbertj, jtyao}@cs.uregina.ca

More information

Approximate Boolean Reasoning Approach to Rough Sets and Data Mining

Approximate Boolean Reasoning Approach to Rough Sets and Data Mining Approximate Boolean Reasoning Approach to Rough Sets and Data Mining Hung Son Nguyen Institute of Mathematics, Warsaw University son@mimuw.edu.pl RSFDGrC, September 3, 2005 Hung Son Nguyen (UW) ABR approach

More information

ROUGH SETS THEORY AND DATA REDUCTION IN INFORMATION SYSTEMS AND DATA MINING

ROUGH SETS THEORY AND DATA REDUCTION IN INFORMATION SYSTEMS AND DATA MINING ROUGH SETS THEORY AND DATA REDUCTION IN INFORMATION SYSTEMS AND DATA MINING Mofreh Hogo, Miroslav Šnorek CTU in Prague, Departement Of Computer Sciences And Engineering Karlovo Náměstí 13, 121 35 Prague

More information

An algorithm for induction of decision rules consistent with the dominance principle

An algorithm for induction of decision rules consistent with the dominance principle An algorithm for induction of decision rules consistent with the dominance principle Salvatore Greco 1, Benedetto Matarazzo 1, Roman Slowinski 2, Jerzy Stefanowski 2 1 Faculty of Economics, University

More information

Rough Approach to Fuzzification and Defuzzification in Probability Theory

Rough Approach to Fuzzification and Defuzzification in Probability Theory Rough Approach to Fuzzification and Defuzzification in Probability Theory G. Cattaneo and D. Ciucci Dipartimento di Informatica, Sistemistica e Comunicazione Università di Milano Bicocca, Via Bicocca degli

More information

Classification Based on Logical Concept Analysis

Classification Based on Logical Concept Analysis Classification Based on Logical Concept Analysis Yan Zhao and Yiyu Yao Department of Computer Science, University of Regina, Regina, Saskatchewan, Canada S4S 0A2 E-mail: {yanzhao, yyao}@cs.uregina.ca Abstract.

More information

Mathematical Approach to Vagueness

Mathematical Approach to Vagueness International Mathematical Forum, 2, 2007, no. 33, 1617-1623 Mathematical Approach to Vagueness Angel Garrido Departamento de Matematicas Fundamentales Facultad de Ciencias de la UNED Senda del Rey, 9,

More information

A Logical Formulation of the Granular Data Model

A Logical Formulation of the Granular Data Model 2008 IEEE International Conference on Data Mining Workshops A Logical Formulation of the Granular Data Model Tuan-Fang Fan Department of Computer Science and Information Engineering National Penghu University

More information

Data mining using Rough Sets

Data mining using Rough Sets Data mining using Rough Sets Alber Sánchez 1 alber.ipia@inpe.br 1 Instituto Nacional de Pesquisas Espaciais, São José dos Campos, SP, Brazil Referata Geoinformatica, 2015 1 / 44 Table of Contents Rough

More information

Applied Logic. Lecture 3 part 1 - Fuzzy logic. Marcin Szczuka. Institute of Informatics, The University of Warsaw

Applied Logic. Lecture 3 part 1 - Fuzzy logic. Marcin Szczuka. Institute of Informatics, The University of Warsaw Applied Logic Lecture 3 part 1 - Fuzzy logic Marcin Szczuka Institute of Informatics, The University of Warsaw Monographic lecture, Spring semester 2017/2018 Marcin Szczuka (MIMUW) Applied Logic 2018 1

More information

On flexible database querying via extensions to fuzzy sets

On flexible database querying via extensions to fuzzy sets On flexible database querying via extensions to fuzzy sets Guy de Tré, Rita de Caluwe Computer Science Laboratory Ghent University Sint-Pietersnieuwstraat 41, B-9000 Ghent, Belgium {guy.detre,rita.decaluwe}@ugent.be

More information

Semantic Rendering of Data Tables: Multivalued Information Systems Revisited

Semantic Rendering of Data Tables: Multivalued Information Systems Revisited Semantic Rendering of Data Tables: Multivalued Information Systems Revisited Marcin Wolski 1 and Anna Gomolińska 2 1 Maria Curie-Skłodowska University, Department of Logic and Cognitive Science, Pl. Marii

More information

Foundations of Classification

Foundations of Classification Foundations of Classification J. T. Yao Y. Y. Yao and Y. Zhao Department of Computer Science, University of Regina Regina, Saskatchewan, Canada S4S 0A2 {jtyao, yyao, yanzhao}@cs.uregina.ca Summary. Classification

More information

Hierarchical Structures on Multigranulation Spaces

Hierarchical Structures on Multigranulation Spaces Yang XB, Qian YH, Yang JY. Hierarchical structures on multigranulation spaces. JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY 27(6): 1169 1183 Nov. 2012. DOI 10.1007/s11390-012-1294-0 Hierarchical Structures

More information

FUZZY ASSOCIATION RULES: A TWO-SIDED APPROACH

FUZZY ASSOCIATION RULES: A TWO-SIDED APPROACH FUZZY ASSOCIATION RULES: A TWO-SIDED APPROACH M. De Cock C. Cornelis E. E. Kerre Dept. of Applied Mathematics and Computer Science Ghent University, Krijgslaan 281 (S9), B-9000 Gent, Belgium phone: +32

More information

The size of decision table can be understood in terms of both cardinality of A, denoted by card (A), and the number of equivalence classes of IND (A),

The size of decision table can be understood in terms of both cardinality of A, denoted by card (A), and the number of equivalence classes of IND (A), Attribute Set Decomposition of Decision Tables Dominik Slezak Warsaw University Banacha 2, 02-097 Warsaw Phone: +48 (22) 658-34-49 Fax: +48 (22) 658-34-48 Email: slezak@alfa.mimuw.edu.pl ABSTRACT: Approach

More information

Axiomatic set theory. Chapter Why axiomatic set theory?

Axiomatic set theory. Chapter Why axiomatic set theory? Chapter 1 Axiomatic set theory 1.1 Why axiomatic set theory? Essentially all mathematical theories deal with sets in one way or another. In most cases, however, the use of set theory is limited to its

More information

Foundations of Mathematics MATH 220 FALL 2017 Lecture Notes

Foundations of Mathematics MATH 220 FALL 2017 Lecture Notes Foundations of Mathematics MATH 220 FALL 2017 Lecture Notes These notes form a brief summary of what has been covered during the lectures. All the definitions must be memorized and understood. Statements

More information

A Generalized Decision Logic in Interval-set-valued Information Tables

A Generalized Decision Logic in Interval-set-valued Information Tables A Generalized Decision Logic in Interval-set-valued Information Tables Y.Y. Yao 1 and Qing Liu 2 1 Department of Computer Science, University of Regina Regina, Saskatchewan, Canada S4S 0A2 E-mail: yyao@cs.uregina.ca

More information

On the Structure of Rough Approximations

On the Structure of Rough Approximations On the Structure of Rough Approximations (Extended Abstract) Jouni Järvinen Turku Centre for Computer Science (TUCS) Lemminkäisenkatu 14 A, FIN-20520 Turku, Finland jjarvine@cs.utu.fi Abstract. We study

More information

Uncertainty and Rules

Uncertainty and Rules Uncertainty and Rules We have already seen that expert systems can operate within the realm of uncertainty. There are several sources of uncertainty in rules: Uncertainty related to individual rules Uncertainty

More information

Easy Categorization of Attributes in Decision Tables Based on Basic Binary Discernibility Matrix

Easy Categorization of Attributes in Decision Tables Based on Basic Binary Discernibility Matrix Easy Categorization of Attributes in Decision Tables Based on Basic Binary Discernibility Matrix Manuel S. Lazo-Cortés 1, José Francisco Martínez-Trinidad 1, Jesús Ariel Carrasco-Ochoa 1, and Guillermo

More information

On Rough Set Modelling for Data Mining

On Rough Set Modelling for Data Mining On Rough Set Modelling for Data Mining V S Jeyalakshmi, Centre for Information Technology and Engineering, M. S. University, Abhisekapatti. Email: vsjeyalakshmi@yahoo.com G Ariprasad 2, Fatima Michael

More information

Tolerance Approximation Spaces. Andrzej Skowron. Institute of Mathematics. Warsaw University. Banacha 2, Warsaw, Poland

Tolerance Approximation Spaces. Andrzej Skowron. Institute of Mathematics. Warsaw University. Banacha 2, Warsaw, Poland Tolerance Approximation Spaces Andrzej Skowron Institute of Mathematics Warsaw University Banacha 2, 02-097 Warsaw, Poland e-mail: skowron@mimuw.edu.pl Jaroslaw Stepaniuk Institute of Computer Science

More information

ARTICLE IN PRESS. Information Sciences xxx (2016) xxx xxx. Contents lists available at ScienceDirect. Information Sciences

ARTICLE IN PRESS. Information Sciences xxx (2016) xxx xxx. Contents lists available at ScienceDirect. Information Sciences Information Sciences xxx (2016) xxx xxx Contents lists available at ScienceDirect Information Sciences journal homepage: www.elsevier.com/locate/ins Three-way cognitive concept learning via multi-granularity

More information

Fuzzy Rough Sets with GA-Based Attribute Division

Fuzzy Rough Sets with GA-Based Attribute Division Fuzzy Rough Sets with GA-Based Attribute Division HUGANG HAN, YOSHIO MORIOKA School of Business, Hiroshima Prefectural University 562 Nanatsuka-cho, Shobara-shi, Hiroshima 727-0023, JAPAN Abstract: Rough

More information

Roman Słowiński. Rough or/and Fuzzy Handling of Uncertainty?

Roman Słowiński. Rough or/and Fuzzy Handling of Uncertainty? Roman Słowiński Rough or/and Fuzzy Handling of Uncertainty? 1 Rough sets and fuzzy sets (Dubois & Prade, 1991) Rough sets have often been compared to fuzzy sets, sometimes with a view to introduce them

More information

ROUGH NEUTROSOPHIC SETS. Said Broumi. Florentin Smarandache. Mamoni Dhar. 1. Introduction

ROUGH NEUTROSOPHIC SETS. Said Broumi. Florentin Smarandache. Mamoni Dhar. 1. Introduction italian journal of pure and applied mathematics n. 32 2014 (493 502) 493 ROUGH NEUTROSOPHIC SETS Said Broumi Faculty of Arts and Humanities Hay El Baraka Ben M sik Casablanca B.P. 7951 Hassan II University

More information

Comparison of Rough-set and Interval-set Models for Uncertain Reasoning

Comparison of Rough-set and Interval-set Models for Uncertain Reasoning Yao, Y.Y. and Li, X. Comparison of rough-set and interval-set models for uncertain reasoning Fundamenta Informaticae, Vol. 27, No. 2-3, pp. 289-298, 1996. Comparison of Rough-set and Interval-set Models

More information

CRITERIA REDUCTION OF SET-VALUED ORDERED DECISION SYSTEM BASED ON APPROXIMATION QUALITY

CRITERIA REDUCTION OF SET-VALUED ORDERED DECISION SYSTEM BASED ON APPROXIMATION QUALITY International Journal of Innovative omputing, Information and ontrol II International c 2013 ISSN 1349-4198 Volume 9, Number 6, June 2013 pp. 2393 2404 RITERI REDUTION OF SET-VLUED ORDERED DEISION SYSTEM

More information

ARPN Journal of Science and Technology All rights reserved.

ARPN Journal of Science and Technology All rights reserved. Rule Induction Based On Boundary Region Partition Reduction with Stards Comparisons Du Weifeng Min Xiao School of Mathematics Physics Information Engineering Jiaxing University Jiaxing 34 China ABSTRACT

More information

i jand Y U. Let a relation R U U be an

i jand Y U. Let a relation R U U be an Dependency Through xiomatic pproach On Rough Set Theory Nilaratna Kalia Deptt. Of Mathematics and Computer Science Upendra Nath College, Nalagaja PIN: 757073, Mayurbhanj, Orissa India bstract: The idea

More information

Fuzzy Modal Like Approximation Operations Based on Residuated Lattices

Fuzzy Modal Like Approximation Operations Based on Residuated Lattices Fuzzy Modal Like Approximation Operations Based on Residuated Lattices Anna Maria Radzikowska Faculty of Mathematics and Information Science Warsaw University of Technology Plac Politechniki 1, 00 661

More information

3. Only sequences that were formed by using finitely many applications of rules 1 and 2, are propositional formulas.

3. Only sequences that were formed by using finitely many applications of rules 1 and 2, are propositional formulas. 1 Chapter 1 Propositional Logic Mathematical logic studies correct thinking, correct deductions of statements from other statements. Let us make it more precise. A fundamental property of a statement is

More information

Bayes Theorem - the Rough Set Perspective

Bayes Theorem - the Rough Set Perspective Bayes Theorem - the Rough Set Perspective Zdzis law Pawlak Institute of Theoretical and Applied Informatics, Polish Academy of Sciences, ul. Ba ltycka 5, 44 100 Gliwice, Poland MOTTO: I had come to an

More information

Chapter 2 Rough Set Theory

Chapter 2 Rough Set Theory Chapter 2 Rough Set Theory Abstract This chapter describes the foundations for rough set theory. We outline Pawlak s motivating idea and give a technical exposition. Basics of Pawlak s rough set theory

More information

This is logically equivalent to the conjunction of the positive assertion Minimal Arithmetic and Representability

This is logically equivalent to the conjunction of the positive assertion Minimal Arithmetic and Representability 16.2. MINIMAL ARITHMETIC AND REPRESENTABILITY 207 If T is a consistent theory in the language of arithmetic, we say a set S is defined in T by D(x) if for all n, if n is in S, then D(n) is a theorem of

More information

Rough Soft Sets: A novel Approach

Rough Soft Sets: A novel Approach International Journal of Computational pplied Mathematics. ISSN 1819-4966 Volume 12, Number 2 (2017), pp. 537-543 Research India Publications http://www.ripublication.com Rough Soft Sets: novel pproach

More information

OUTLINE. Introduction History and basic concepts. Fuzzy sets and fuzzy logic. Fuzzy clustering. Fuzzy inference. Fuzzy systems. Application examples

OUTLINE. Introduction History and basic concepts. Fuzzy sets and fuzzy logic. Fuzzy clustering. Fuzzy inference. Fuzzy systems. Application examples OUTLINE Introduction History and basic concepts Fuzzy sets and fuzzy logic Fuzzy clustering Fuzzy inference Fuzzy systems Application examples "So far as the laws of mathematics refer to reality, they

More information

Mining Approximative Descriptions of Sets Using Rough Sets

Mining Approximative Descriptions of Sets Using Rough Sets Mining Approximative Descriptions of Sets Using Rough Sets Dan A. Simovici University of Massachusetts Boston, Dept. of Computer Science, 100 Morrissey Blvd. Boston, Massachusetts, 02125 USA dsim@cs.umb.edu

More information

A Rough Set Interpretation of User s Web Behavior: A Comparison with Information Theoretic Measure

A Rough Set Interpretation of User s Web Behavior: A Comparison with Information Theoretic Measure A Rough et Interpretation of User s Web Behavior: A Comparison with Information Theoretic easure George V. eghabghab Roane tate Dept of Computer cience Technology Oak Ridge, TN, 37830 gmeghab@hotmail.com

More information

Topology. Xiaolong Han. Department of Mathematics, California State University, Northridge, CA 91330, USA address:

Topology. Xiaolong Han. Department of Mathematics, California State University, Northridge, CA 91330, USA  address: Topology Xiaolong Han Department of Mathematics, California State University, Northridge, CA 91330, USA E-mail address: Xiaolong.Han@csun.edu Remark. You are entitled to a reward of 1 point toward a homework

More information

Where are we? Operations on fuzzy sets (cont.) Fuzzy Logic. Motivation. Crisp and fuzzy sets. Examples

Where are we? Operations on fuzzy sets (cont.) Fuzzy Logic. Motivation. Crisp and fuzzy sets. Examples Operations on fuzzy sets (cont.) G. J. Klir, B. Yuan, Fuzzy Sets and Fuzzy Logic: Theory and Applications, Prentice-Hall, chapters -5 Where are we? Motivation Crisp and fuzzy sets alpha-cuts, support,

More information

T (s, xa) = T (T (s, x), a). The language recognized by M, denoted L(M), is the set of strings accepted by M. That is,

T (s, xa) = T (T (s, x), a). The language recognized by M, denoted L(M), is the set of strings accepted by M. That is, Recall A deterministic finite automaton is a five-tuple where S is a finite set of states, M = (S, Σ, T, s 0, F ) Σ is an alphabet the input alphabet, T : S Σ S is the transition function, s 0 S is the

More information

Automata Theory and Formal Grammars: Lecture 1

Automata Theory and Formal Grammars: Lecture 1 Automata Theory and Formal Grammars: Lecture 1 Sets, Languages, Logic Automata Theory and Formal Grammars: Lecture 1 p.1/72 Sets, Languages, Logic Today Course Overview Administrivia Sets Theory (Review?)

More information

APPLICATION FOR LOGICAL EXPRESSION PROCESSING

APPLICATION FOR LOGICAL EXPRESSION PROCESSING APPLICATION FOR LOGICAL EXPRESSION PROCESSING Marcin Michalak, Michał Dubiel, Jolanta Urbanek Institute of Informatics, Silesian University of Technology, Gliwice, Poland Marcin.Michalak@polsl.pl ABSTRACT

More information

METRIC BASED ATTRIBUTE REDUCTION IN DYNAMIC DECISION TABLES

METRIC BASED ATTRIBUTE REDUCTION IN DYNAMIC DECISION TABLES Annales Univ. Sci. Budapest., Sect. Comp. 42 2014 157 172 METRIC BASED ATTRIBUTE REDUCTION IN DYNAMIC DECISION TABLES János Demetrovics Budapest, Hungary Vu Duc Thi Ha Noi, Viet Nam Nguyen Long Giang Ha

More information

1 Introduction Rough sets theory has been developed since Pawlak's seminal work [6] (see also [7]) as a tool enabling to classify objects which are on

1 Introduction Rough sets theory has been developed since Pawlak's seminal work [6] (see also [7]) as a tool enabling to classify objects which are on On the extension of rough sets under incomplete information Jerzy Stefanowski 1 and Alexis Tsouki as 2 1 Institute of Computing Science, Poznań University oftechnology, 3A Piotrowo, 60-965 Poznań, Poland,

More information

Motivation. From Propositions To Fuzzy Logic and Rules. Propositional Logic What is a proposition anyway? Outline

Motivation. From Propositions To Fuzzy Logic and Rules. Propositional Logic What is a proposition anyway? Outline Harvard-MIT Division of Health Sciences and Technology HST.951J: Medical Decision Support, Fall 2005 Instructors: Professor Lucila Ohno-Machado and Professor Staal Vinterbo Motivation From Propositions

More information

2 WANG Jue, CUI Jia et al. Vol.16 no", the discernibility matrix is only a new kind of learning method. Otherwise, we have to provide the specificatio

2 WANG Jue, CUI Jia et al. Vol.16 no, the discernibility matrix is only a new kind of learning method. Otherwise, we have to provide the specificatio Vol.16 No.1 J. Comput. Sci. & Technol. Jan. 2001 Investigation on AQ11, ID3 and the Principle of Discernibility Matrix WANG Jue (Ξ ±), CUI Jia ( ) and ZHAO Kai (Π Λ) Institute of Automation, The Chinese

More information

Research Article Decision Analysis via Granulation Based on General Binary Relation

Research Article Decision Analysis via Granulation Based on General Binary Relation International Mathematics and Mathematical Sciences Volume 2007, Article ID 12714, 13 pages doi:10.1155/2007/12714 esearch Article Decision Analysis via Granulation Based on General Binary elation M. M.

More information

Applications of Some Topological Near Open Sets to Knowledge Discovery

Applications of Some Topological Near Open Sets to Knowledge Discovery IJACS International Journal of Advanced Computer Science Applications Vol 7 No 1 216 Applications of Some Topological Near Open Sets to Knowledge Discovery A S Salama Tanta University; Shaqra University

More information

Solving Classification Problems By Knowledge Sets

Solving Classification Problems By Knowledge Sets Solving Classification Problems By Knowledge Sets Marcin Orchel a, a Department of Computer Science, AGH University of Science and Technology, Al. A. Mickiewicza 30, 30-059 Kraków, Poland Abstract We propose

More information

3. DIFFERENT MODEL TYPES

3. DIFFERENT MODEL TYPES 3-1 3. DIFFERENT MODEL TYPES It is important for us to fully understand a physical problem before we can select a solution strategy for it. Models are convenient tools that enhance our understanding and

More information

With Question/Answer Animations. Chapter 2

With Question/Answer Animations. Chapter 2 With Question/Answer Animations Chapter 2 Chapter Summary Sets The Language of Sets Set Operations Set Identities Functions Types of Functions Operations on Functions Sequences and Summations Types of

More information

n Empty Set:, or { }, subset of all sets n Cardinality: V = {a, e, i, o, u}, so V = 5 n Subset: A B, all elements in A are in B

n Empty Set:, or { }, subset of all sets n Cardinality: V = {a, e, i, o, u}, so V = 5 n Subset: A B, all elements in A are in B Discrete Math Review Discrete Math Review (Rosen, Chapter 1.1 1.7, 5.5) TOPICS Sets and Functions Propositional and Predicate Logic Logical Operators and Truth Tables Logical Equivalences and Inference

More information

REDUCTS AND ROUGH SET ANALYSIS

REDUCTS AND ROUGH SET ANALYSIS REDUCTS AND ROUGH SET ANALYSIS A THESIS SUBMITTED TO THE FACULTY OF GRADUATE STUDIES AND RESEARCH IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASTER OF SCIENCE IN COMPUTER SCIENCE UNIVERSITY

More information

COMP219: Artificial Intelligence. Lecture 19: Logic for KR

COMP219: Artificial Intelligence. Lecture 19: Logic for KR COMP219: Artificial Intelligence Lecture 19: Logic for KR 1 Overview Last time Expert Systems and Ontologies Today Logic as a knowledge representation scheme Propositional Logic Syntax Semantics Proof

More information

Preliminaries to the Theory of Computation

Preliminaries to the Theory of Computation Preliminaries to the Theory of Computation 2 In this chapter, we explain mathematical notions, terminologies, and certain methods used in convincing logical arguments that we shall have need of throughout

More information

Fuzzy Logic and Computing with Words. Ning Xiong. School of Innovation, Design, and Engineering Mälardalen University. Motivations

Fuzzy Logic and Computing with Words. Ning Xiong. School of Innovation, Design, and Engineering Mälardalen University. Motivations /3/22 Fuzzy Logic and Computing with Words Ning Xiong School of Innovation, Design, and Engineering Mälardalen University Motivations Human centric intelligent systems is a hot trend in current research,

More information

Rough Sets for Uncertainty Reasoning

Rough Sets for Uncertainty Reasoning Rough Sets for Uncertainty Reasoning S.K.M. Wong 1 and C.J. Butz 2 1 Department of Computer Science, University of Regina, Regina, Canada, S4S 0A2, wong@cs.uregina.ca 2 School of Information Technology

More information

UPPER AND LOWER SET FORMULAS: RESTRICTION AND MODIFICATION OF THE DEMPSTER-PAWLAK FORMALISM

UPPER AND LOWER SET FORMULAS: RESTRICTION AND MODIFICATION OF THE DEMPSTER-PAWLAK FORMALISM Int. J. Appl. Math. Comput. Sci., 2002, Vol.12, No.3, 359 369 UPPER AND LOWER SET FORMULAS: RESTRICTION AND MODIFICATION OF THE DEMPSTER-PAWLAK FORMALISM ISMAIL BURHAN TÜRKŞEN Knowledge/Intelligence Systems

More information

Overview. Knowledge-Based Agents. Introduction. COMP219: Artificial Intelligence. Lecture 19: Logic for KR

Overview. Knowledge-Based Agents. Introduction. COMP219: Artificial Intelligence. Lecture 19: Logic for KR COMP219: Artificial Intelligence Lecture 19: Logic for KR Last time Expert Systems and Ontologies oday Logic as a knowledge representation scheme Propositional Logic Syntax Semantics Proof theory Natural

More information

1) Totality of agents is (partially) ordered, with the intended meaning that t 1 v t 2 intuitively means that \Perception of the agent A t2 is sharper

1) Totality of agents is (partially) ordered, with the intended meaning that t 1 v t 2 intuitively means that \Perception of the agent A t2 is sharper On reaching consensus by groups of intelligent agents Helena Rasiowa and Wiktor Marek y Abstract We study the problem of reaching the consensus by a group of fully communicating, intelligent agents. Firstly,

More information

Fuzzy Propositional Logic for the Knowledge Representation

Fuzzy Propositional Logic for the Knowledge Representation Fuzzy Propositional Logic for the Knowledge Representation Alexander Savinov Institute of Mathematics Academy of Sciences Academiei 5 277028 Kishinev Moldova (CIS) Phone: (373+2) 73-81-30 EMAIL: 23LSII@MATH.MOLDOVA.SU

More information

In this initial chapter, you will be introduced to, or more than likely be reminded of, a

In this initial chapter, you will be introduced to, or more than likely be reminded of, a 1 Sets In this initial chapter, you will be introduced to, or more than likely be reminded of, a fundamental idea that occurs throughout mathematics: sets. Indeed, a set is an object from which every mathematical

More information

Inference of A Minimum Size Boolean Function by Using A New Efficient Branch-and-Bound Approach From Examples

Inference of A Minimum Size Boolean Function by Using A New Efficient Branch-and-Bound Approach From Examples Published in: Journal of Global Optimization, 5, pp. 69-9, 199. Inference of A Minimum Size Boolean Function by Using A New Efficient Branch-and-Bound Approach From Examples Evangelos Triantaphyllou Assistant

More information

Model Complexity of Pseudo-independent Models

Model Complexity of Pseudo-independent Models Model Complexity of Pseudo-independent Models Jae-Hyuck Lee and Yang Xiang Department of Computing and Information Science University of Guelph, Guelph, Canada {jaehyuck, yxiang}@cis.uoguelph,ca Abstract

More information

Pei Wang( 王培 ) Temple University, Philadelphia, USA

Pei Wang( 王培 ) Temple University, Philadelphia, USA Pei Wang( 王培 ) Temple University, Philadelphia, USA Artificial General Intelligence (AGI): a small research community in AI that believes Intelligence is a general-purpose capability Intelligence should

More information

A Class of Star-Algebras for Point-Based Qualitative Reasoning in Two- Dimensional Space

A Class of Star-Algebras for Point-Based Qualitative Reasoning in Two- Dimensional Space From: FLAIRS- Proceedings. Copyright AAAI (www.aaai.org). All rights reserved. A Class of Star-Algebras for Point-Based Qualitative Reasoning in Two- Dimensional Space Debasis Mitra Department of Computer

More information

Home Page. Title Page. Page 1 of 35. Go Back. Full Screen. Close. Quit

Home Page. Title Page. Page 1 of 35. Go Back. Full Screen. Close. Quit JJ II J I Page 1 of 35 General Attribute Reduction of Formal Contexts Tong-Jun Li Zhejiang Ocean University, China litj@zjou.edu.cn September, 2011,University of Milano-Bicocca Page 2 of 35 Objective of

More information