Rough Set Analysis of Preference-Ordered Data

Size: px

Start display at page:

Download "Rough Set Analysis of Preference-Ordered Data"

Anis Walters
5 years ago
Views:

1 Rough Set Analysis of Preference-Ordered Data Roman S lowiński 1, Salvatore Greco 2, and Benedetto Matarazzo 2 1 Institute of Computing Science, Poznan University of Technology, Piotrowo 3a, Poznan, Poland slowinsk@sol.put.poznan.pl 2 Faculty of Economics, University of Catania, Corso Italia, 55, Catania, Italy {salgreco,matarazz}@mbox.unict.it Abstract. The paper is devoted to knowledge discovery from data, taking into account prior knowledge about preference semantics in patterns to be discovered. The data concern a set of situations (objects, states, examples) described by a set of attributes (properties, features, characteristics). The attributes are, in general, divided into condition and decision attributes, corresponding to input and output of a situation. The situations are partitioned by decision attributes into decision classes. A pattern discovered from the data has a symbolic form of decision rule or decision tree. In many practical problems, some condition attributes are defined on preference-ordered scales and the decision classes are also preference-ordered. The known methods of knowledge discovery ignore, unfortunately, this preference information, taking thus a risk of drawing wrong patterns. To deal with preference-ordered data we propose to use a new approach called Dominance-based Rough Set Approach (DRSA). Given a set of situations described by at least one condition attribute with preference-ordered scale and partitioned into preferenceordered classes, the new rough set approach is able to approximate this partition by means of dominance relations. The rough approximation of this partition is a starting point for induction of if..., then... decision rules. The syntax of these rules is adapted to represent preference orders. The DRSA analyses only facts present in data and possible inconsistencies are identified. It preserves the concept of granular computing, however, the granules are dominance cones in evaluation space, and not bounded sets. It is also concordant with the paradigm of computing with words, as it exploits ordinal, and not necessarily cardinal, character of data. 1 How Prior Knowledge Influences Knowledge Discovery? Discovering knowledge from data means being able to find concise classification patterns that agree with situations described bythe data. Theyare useful for explanation of data and for prediction of future situations in such applications as technical diagnostics, performance evaluation or risk assessment. The situations J.J. Alpigini et al. (Eds.): RSCTC 2002, LNAI 2475, pp , c Springer-Verlag Berlin Heidelberg 2002

2 Rough Set Analysis of Preference-Ordered Data 45 are described bya set of attributes, called also properties, features, characteristics, etc. The attributes maybe either on condition or decision side of the description, corresponding to input or output of a situation. The situations may be objects, states, examples, etc. It will be convenient to call them objects in this paper. The data set in which classification patterns are searched for is called learning sample. Learning of patterns from this sample assumes certain prior knowledge that mayinclude the following items: (i) domains of attributes, i.e. sets of values that an attribute maytake while being meaningful for user s perception, (ii) division of attributes into condition and decision attributes, restricting the range of patterns to functional relations between condition and decision attributes, (iii) preference order in domains of some attributes and semantic correlation between pairs of these attributes, requiring the patterns to observe the dominance principle. In fact, item (i) is usuallytaken into account in knowledge discovery. With this prior knowledge only, one can discover patterns called association rules [1], showing strong relationships between values of some attributes, without fixing which attributes will be on the condition and which ones on the decision side in all rules. If item (i) is combined with item (ii) in prior knowledge, then one can consider a partition of the learning sample into decision classes defined bydecision attributes. The patterns to be discovered have then the form of decision trees or decision rules representing functional relations between condition and decision attributes. These patterns are typicallydiscovered bymachine learning and data mining methods [19]. As there is a direct correspondence between decision tree and rules, we will concentrate further our attention on decision rules. As item (iii) is crucial for this paper, let us explain it in more detail. Consider an example of data set concerning pupils achievements in a high school. Suppose that among attributes describing the pupils there are results in mathematics (Math) and physics (Ph), and a general achievement (GA). The domains of these attributes are composed of three values: bad, medium and good. This information constitutes item (i) of prior knowledge. The preference order of the attribute values is obvious: good is better than medium and bad, and medium is better than bad. It is known, moreover, that Math is semanticallycorrelated with GA, aswell as Ph with GA. This is, precisely, item (iii) of prior knowledge. Attributes with preference-ordered domains are called criteria in decision theory. We will use the name of regular attributes for those attributes whose domains are not preferenceordered. Semantic correlation between two criteria means that an improvement on one criterion should not worsen evaluation on the second criterion. In our example, improvement of a pupil s score in Math or Ph, with other attribute values unchanged, should not worsen pupil s general achievement GA, but rather improve it.

3 46 R. S lowiński, S. Greco, and B. Matarazzo What classification patterns can be drawn from the pupils data set? If prior knowledge includes items (i) and (iii) only, then association rules can be induced; if item (ii) is known in addition to (i) and (iii), then decision rules can be induced. The next question is: how item (iii) influences association rules and decision rules? It has been specified above that item (iii) requires the patterns to observe the dominance principle. The dominance principle (called also Pareto principle) should be observed by(association and decision) rules having at least one pair of semanticallycorrelated criteria spanned over condition and decision part. Each rule is characterized by condition profile and decision profile, corresponding to vectors of threshold values of attributes in condition and decision part of the rule, respectively. We say that one profile dominates another if theyboth involve the same attributes and the criteria values of the first profile are not worse than criteria values of the second profile, while the values of regular attributes in both profiles are indiscernible. The dominance principle requires the following: consider two rules, r and s, involving the same regular attributes and criteria, such that each criterion used in the condition part is semanticallycorrelated with at least one criterion present in the decision part of these rules; if condition profile of rule r dominates condition profile of rule s, then the decision profile of rule r should also dominate decision profile of rule s. Suppose that two rules induced from the pupils data set relate Math and Ph on the condition side, with GA on the decision side: rule #1: if Math=medium and Ph=medium, then GA=good, rule #2: if Math=good and Ph=medium, then GA=medium, The two rules do not observe the dominance principle because the condition profile of rule #2 dominates the condition profile of rule #1, while the decision profile of rule #2 is dominated bythe decision profile of rule #1. Thus, in the sense of the dominance principle the two rules are inconsistent, that is theyare wrong. One could saythat the above rules are true because theyare supported byexamples of pupils from the learning sample, but this would mean that the examples are also inconsistent. The inconsistency maycome from manysources, e.g.: missing attributes (regular ones or criteria) in the description of objects; maybe the data set does not include such attribute as opinion of pupil s tutor (OT ) expressed onlyverballyduring assessment of pupil s GA byschool teachers council, unstable preferences of decision makers; maybe the members of school teachers council changed their view on influence of Math on GA during the assessment. Handling these inconsistencies is of crucial importance for knowledge discovery. Theycannot be simplyconsidered as noise or error to be eliminated from data, or amalgamated with consistent data bysome averaging operators, but theyshould be identified and presented as uncertain patterns.

4 Rough Set Analysis of Preference-Ordered Data 47 If item (iii) would be ignored in prior knowledge, then the handling of above mentioned inconsistencies would be impossible. Indeed, there would be nothing wrong in rules #1 and #2: theyare supported bydifferent examples discerned byconsidered attributes. It has been acknowledged bymanyauthors that rough set theory provides excellent framework for dealing with inconsistencyin knowledge discovery[18, 20, 21, 22, 24, 27, 29, 30]. The paradigm of rough set theoryis that of granular computing, because the main concept of the theory rough approximation of a set is build up of blocks of objects indiscernible bya given set of attributes, called granules of knowledge. In space of regular attributes, the granules are bounded sets. Decision rules induced from rough approximation of a classification are also build up of such granules. While taking into account prior knowledge of type (i) and (ii), the rough approximation and the inherent rule induction ignore, however, prior knowledge of type (iii). In consequence, the resulting decision rules maybe inconsistent with the dominance principle. The authors have proposed an extension of the granular computing paradigm that permits taking into account prior knowledge of type (iii), in addition to either (i) only[17], or (i) and (ii) together [5, 6, 10, 13, 15, 26]. Combination of the new granules with the idea of rough approximation makes the, so-called, Dominance-based Rough Set Approach (DRSA). In the following sections we present the concept of granules permitting to handle prior knowledge of type (iii), then we briefly sketch DRSA and its main extensions; as sets of decision rules resulting from DRSA can be seen as preference models in multicriteria decision problems, we brieflycomment this issue; application of the new paradigm of granular computing to induction of association rules is also mentioned before conclusions. 2How Prior Knowledge about Preference Order in Data Influences the Granular Computing? q=1 In other words, how should be defined the granule of knowledge in the attribute space in order to take into account prior knowledge about preference order in data when searching for rules? As it is usual in knowledge discoverymethods, information about objects is represented in a data table, in which rows are labelled by objects and contain the values of attributes for each corresponding object, whereas columns are labelled by attributes and contain the values of each corresponding attribute for the objects. Let U denote a finite set of objects (universe) and Q a finite set of attributes divided into set C of condition attributes and set D of decision attributes; C D C D =. Let also X C = X q and X D = X q be attribute spaces corresponding to sets of condition and decision attributes, respectively. Elements of X C and X D can be interpreted as possible evaluation of objects on attributes from set C={1,..., C } and from set D={1,..., D }, respectively. Therefore, X q q=1

5 48 R. S lowiński, S. Greco, and B. Matarazzo is the set of possible evaluations of considered objects with respect to attribute q. Value of object x on attribute q Q is denoted by x q. Objects x and y are indiscernible by P C if x q = y q for all q P and, analogously, objects x and y are indiscernible by R D if x q = y q for all q R. Sets of indiscernible objects are equivalence classes of the corresponding indiscernibilityrelation I P or I R. Moreover, I P (x) and I R (x) denote equivalence classes including object x. I D makes a partition of U into a finite number of decision classes Cl={Cl t, t T }, T ={1,...,n}. Each x U belongs to one and onlyone class Cl t Cl. The above definitions take into account prior knowledge of type (i) and (ii) only. In this case, the granules of knowledge are bounded sets in X P and X R (P C and R D), defined bypartitions of U induced byindiscernibility relations I P and I R, respectively. Then, classification patterns to be discovered are functions representing granules I R (x) bygranules I P (x) in condition attribute space X P, for any P Cand any x U. If prior knowledge includes item (iii) in addition to (i) and (ii), then indiscernibilityrelation is unable to produce granules in X C and X D taking into account the preference order. To do so, it has to be substituted bydominance relation in X P and X R (P C and R D). Suppose, for simplicity, that all condition attributes in C and all decision attributes in D are criteria, and that C and D are semanticallycorrelated. Let q be a weak preference relation on U (often called outranking) representing a preference on the set of objects with respect to criterion q {C D}; x q q y q means x q is at least as good as y q with respect to criterion q. On the one hand, we saythat x dominates y with respect to P C (shortly, x P-dominates y) in condition attribute space X P (denotation xd P y)ifx q q y q for all q P. Assuming, without loss of generality, that the domains of criteria are numerical, i.e. X q R for any q C, and that theyare ordered such that preference increases with the value, one can saythat xd P y is equivalent to: x q y q for all q P, P C. Observe that for each x X P, xd P x, i.e. P - dominance is reflexive. On the other hand, analogical definition holds in decision attribute space X R (denotation xd R y), R D. The dominance relations xd P y and xd R y (P C and R D) are directional statements where x is a subject and y is a referent. If x X P is the referent, then one can define a set of objects y X P dominating x, called P -dominating set, D + P (x)={y U: yd P x}. If x X P is the subject, then one can define a set of objects y X P dominated by x, called P -dominated set, D P (x)={y U: xd P y}. P -dominating sets D + P (x) and P -dominated sets D P (x) correspond to positive and negative dominance cones in X P, with the origin x. As to decision attribute space X R, R D, the R-dominance relation permits to define sets: Cl x R ={y U: yd Rx}, Cl x R ={y U: xd Ry}. Cl tq ={x X D : x q = t q } is a decision class with respect to q D. Cl x R is called upward union of classes, and Cl x, downward union of classes. If x R

6 Rough Set Analysis of Preference-Ordered Data 49 Cl x R, then x belongs to class Cl t q, x q = t q, or better on each decision attribute q R; ifx Cl x R, then x belongs to class Cl t q, x q = t q, or worse on each decision attribute q R. The downward and upward unions of classes correspond to positive and negative dominance cones in X R, respectively. In this case, the granules of knowledge are open sets in X P and X R defined bydominance cones D + P (x), D P (x) (P C) and Cl x R, Cl x R (R D), respectively. Then, classification patterns to be discovered are functions representing granules Cl x R, Cl x R bygranules D+ P (x), D P (x), respectively, in condition attribute space X P, for any P Cand R D and any x X P. In both cases above, the functions are sets of decision rules. 3 Dominance-Based Rough Set Approach (DRSA) 3.1 Granular Computing with Dominance Cones Suppose, for simplicity, that set D of decision attributes is a singleton, D={d}. Decision attribute d makes a partition of U into a finite number of classes Cl={Cl t, t T }, T ={1,...,n}. Each x U belongs to one and onlyone class Cl t Cl. The upward and downward unions of classes boil down, respectively, to: = Cl s, Cl t = Cl s, t=1,...,n. Cl t s t Notice that for t=2,...,n we have Cl n = U Cl t 1, i.e. all the objects not belonging to class Cl t or better, belong to class Cl t 1 or worse. Let us explain how the rough set concept has been generalized to DRSA in order to enable granular computing with dominance cones (for more details, see [5, 6, 10, 13, 26]). Given a set of criteria P C, the inclusion of an object x U to the upward union of classes Cl t, t=2,...,n, creates an inconsistency in the sense of dominance principle if one of the following conditions holds: x belongs to class Cl t or better but it is P -dominated byan object y belonging to a class worse than Cl t, i.e. x Cl t but D + P (x) Cl t 1, x belongs to a worse class than Cl t but it P -dominates an object y belonging to class Cl t or better, i.e. x/ Cl t but D P (x) Cl t. If, given a set of criteria P C, the inclusion of x U to Cl t, t=2,...,n, creates an inconsistencyin the sense of dominance principle, we saythat x belongs to Cl t with some ambiguity. Thus, x belongs to Cl t without any ambiguity with respect to P C, ifx Cl t and there is no inconsistencyin the sense of dominance principle. This means that all objects P -dominating x belong to Cl t, i.e. D + P (x) Cl t. Geometrically, this corresponds to inclusion of the complete set of objects contained in the positive dominance cone originating in x, inthe positive dominance cone Cl t originating in Cl t. Furthermore, x possibly belongs to Cl t with respect to P C if one of the following conditions holds: s t

7 50 R. S lowiński, S. Greco, and B. Matarazzo 1. according to decision attribute d, x belongs to Cl t, 2. according to decision attribute d, x does not belong to Cl t but it is inconsistent in the sense of dominance principle with an object y belonging to Cl t. In terms of ambiguity, x possiblybelongs to Cl t with respect to P C, if x belongs to Cl t with or without anyambiguity. Due to reflexivityof the dominance relation D P, conditions 1) and 2) can be summarized as follows: x possibly belongs to class Cl t or better, with respect to P C, if among the objects P -dominated by x there is an object y belonging to class Cl t or better, i.e. D P (x) Cl t. Geometrically, this corresponds to non-empty intersection of the set of objects contained in the negative dominance cone originating in x, with the positive dominance cone Cl t originating in Cl t. For P C, the set of all objects belonging to Cl t without anyambiguity constitutes the P -lower approximation of Cl t, denoted by P Cl t, and the set of all objects that possiblybelong to Cl t constitutes the P -upper approximation of Cl t, denoted by P ( Cl t ): P ( Cl t )={x U: D + P (x) Cl t }, P ( Cl t )= {x U: D P (x) Cl t }, for t=1,...,n. Analogously, one can define P -lower approximation and P -upper approximation of Cl t as follows: P ( Cl t )={x U: D P (x) Cl t }, P ( Cl t )={x U: D + P (x) Cl t }, for t=1,...,n. All the objects belonging to Cl t and Cl t with some ambiguityconstitute the P -boundary of Cl t and Cl t, denoted by Bn P (Cl t ) and Bn P (Cl t ), respectively. Theycan be represented in terms of upper and lower approximations as follows: Bn P (Cl t )=P ( Cl t ) P ( Cl t ), Bn P (Cl t )=P ( Cl t ) P ( Cl t ), for t=1,...,n. P lower and P upper approximations of unions of classes Cl t and Cl t have an important propertyof complementarity. It says that if object x belongs without anyambiguityto class Cl t or better, it is impossible that it could belong to class Cl t 1 or worse, i.e. P ( Cl t )= U P (Cl t 1 ), t=2,...,n. Due to complementarityproperty, Bn P (Cl t )=Bn P (Cl t 1 ), for t=2,...,n, which means that if x belongs with ambiguityto class Cl t or better, it also belongs with ambiguity to class Cl t 1 or worse. From the knowledge discoverypoint of view, P -lower approximations of unions of classes represent certain knowledge provided bycriteria from P C, while P -upper approximations represent possible knowledge and the P - boundaries contain doubtful knowledge. The above definition of rough approximations are based on a strict application of the dominance principle. However, when defining non-ambiguous objects,

8 Rough Set Analysis of Preference-Ordered Data 51 it is reasonable to accept a limited proportion of negative examples, particularlyfor large data tables. Such extended version of DRSA is called Variable- ConsistencyDRSA model (VC-DRSA) [16]. For every P C, the objects being consistent in the sense of dominance principle with all upward and downward unions of classes are P-correctly classified. For every P C, the quality of approximation of classification Cl byset of criteria P is defined as the ratio between the number of P -correctlyclassified objects and the number of all the objects in the data sample set. Since the objects P correctlyclassified are those ones that do not belong to anyp -boundaryof unions Cl t and Cl t, t=1,...,n, the qualityof approximation of classification Cl byset of criteria P, can be written as ( ( ) U - Bn P (Cl )) t t T γ P (Cl) = =. U γ P (Cl) can be seen as a measure of the qualityof knowledge that can be extracted from the data table, where P is the set of criteria and Cl is the considered classification. Each minimal subset P C such that γ P (Cl) =γ C (Cl) is called a reduct of Cl and is denoted by RED Cl. Let us remark that a data sample set can have more than one reduct. The intersection of all reducts is called the core and is denoted by CORE Cl. Criteria from CORE Cl cannot be removed from the data sample set without deteriorating the knowledge to be discovered. This means that in set C there are three categories of criteria: 1) indispensable criteria included in the core, 2) exchangeable criteria included in some reducts but not in the core, 3) redundant criteria being neither indispensable nor exchangeable, thus not included in anyreduct. 3.2 Induction of Decision Rules The dominance-based rough approximations of upward and downward unions of classes can serve to induce a generalized description of objects contained in the data table in terms of if..., then... decision rules. For a given upward or downward union of classes, Cl t or Cls, the decision rules induced under a hypothesis that objects belonging to P ( Cl t )orp(cls )are positive and all the others negative, suggest an assignment to class t or better, or to class Cl s or worse, respectively. On the other hand, the decision rules induced under a hypothesis that objects belonging to the intersection P (Cls ) P (Cl t ) are positive and all the others negative, are suggesting an assignment to some classes between Cl s and Cl t (s<t). In case of preference-ordered data it is meaningful to consider the following five types of decision rules:

9 52 R. S lowiński, S. Greco, and B. Matarazzo 1. certain D -decision rules, providing lower profile descriptions for objects belonging to Cl t without ambiguity: if x q1 q1 r q1 and x q2 q2 r q2 and... x qp qp r qp, then x Cl t, where for each w q,z q X q, w q q z q means w q is at least as good as z q, 2. possible D -decision rules, providing lower profile descriptions for objects belonging to Cl t with or without anyambiguity: if x q1 q1 r q1 and x q2 q2 r q2 and... x qp qp r qp, then x possiblybelongs to Cl t, 3. certain D -decision rules, providing upper profile descriptions for objects belonging to Cl t without ambiguity: if x q1 q1 r q1 and x q2 q2 r q2 and... x qp qp r qp, then x Cl t, where for each w q,z q X q, w q q z q means w q is at most as good as z q, 4. possible D -decision rules, providing upper profile descriptions for objects belonging to Cl t with or without anyambiguity: if x q1 q1 r q1 and x q2 q2 r q2 and...x qp qp r qp, then x possiblybelongs to Cl t, 5. approximate D -decision rules, providing simultaneouslylower and upper profile descriptions for objects belonging to Cl s Cl s+1... Cl t without possibilityof discerning to which class: if x q1 q1 r q1 and... x qk qk r qk and x qk+1 qk+1 r qk+1 and... x qp qp r qp, then x Cl s Cl s+1... Cl t. In the left hand side of a D -decision rule we can have x q q r q and x q q r q, where r q r q, for the same q C. Moreover, if r q = r q, the two conditions boil down to x q q r q, where for each w q,z q X q, w q q z q means w q is indifferent to z q. Since a decision rule is an implication, bya minimal rule we understand such an implication that there is no other implication with the left hand side (LHS) of at least the same weakness (in other words, rule using a subset of elementary conditions or/and weaker elementaryconditions) and the right hand side (RHS) of at least the same strength (in other words, a D -orad -decision rule assigning objects to the same union or sub-union of classes, or a D -decision rule assigning objects to the same or larger set of classes). The rules of type 1) and 3) represent certain knowledge extracted from the data table, while the rules of type 2), 4) represent possible knowledge, and rules of type 5) represent doubtful knowledge. The rules of type 1) and 3) are exact, if theydo not cover negative examples, and theyare probabilistic, otherwise. In the latter case, each rule is characterized bya confidence ratio, representing the probabilitythat an object matching LHS of the rule matches also its RHS. Probabilistic rules are concordant with the VC-DRSA model mentioned above. Let us comment application of decision rules to the objects described by criteria from C. When applying D -decision rules to object x, it is possible that x either matches LHS of at least one decision rule or does not match LHS of any decision rule. In the case of at least one matching, it is reasonable to conclude that x belongs to class Cl t, being the lowest class of the upward union Cl t

10 Rough Set Analysis of Preference-Ordered Data 53 resulting from intersection of all RHS of rules covering x. Precisely, if x matches LHS of rules ρ 1, ρ 2,...,ρ m, having RHS x Cl t1, x Cl t2,...,x Cl tm, then x is assigned to class Cl t, where t=max{t1,t2,...,tm}. In the case of no matching, it is concluded that x belongs to Cl 1, i.e. to the worst class, since no rule with RHS suggesting a better classification of x is covering this object. Analogously, when applying D -decision rules to object x, it is concluded that x belongs either to class Cl z, being the highest class of the downward union Cl t resulting from intersection of all RHS of rules covering x, or to class Cl n, i.e. to the best class, when x is not covered byanyrule. Precisely, if x matches the LHS of rules ρ 1, ρ 2,...,ρ m, having RHS x Cl t1, x Cl t2,..., x Cl tm, then x is assigned to class Cl t, where t=min{t1,t2,...,tm}. In the case of no matching, it is concluded that x belongs to the best class Cl n because no rule with RHS suggesting a worse classification of x is covering this object. Finally, when applying D -decision rules to object x, it is concluded that x belongs to the union of all classes suggested in RHS of rules covering x. A set of decision rules is complete if it is able to cover all objects from the data table in such a waythat consistent objects are re-classified to their original classes and inconsistent objects are classified to clusters of classes referring to this inconsistency. We call minimal each set of decision rules that is complete and non-redundant, i.e. exclusion of anyrule from this set makes it non-complete. One of three induction strategies can be adopted to obtain a set of decision rules [28]: generation of a minimal description, i.e. a minimal set of rules, generation of an exhaustive description, i.e. all rules for a given data table, generation of a characteristic description, i.e. a set of rules covering relatively manyobjects each, however, all together not necessarilyall objects from U. Let us observe that the syntax of decision rules induced from rough approximations defined using dominance cones, consistentlyuse this type of granules. Each condition profile defines a dominance cone in X C, and each decision profile defines a dominance cone in X D in both cases, the cone is positive for D -rules and negative for D -rules. Let also remark that dominance cones corresponding to condition profiles can originate in anypoint of X C, without risk of being too specific. Thus, contrary to traditional granular computing, the condition attribute space X C need not to be discretized. To conclude the description of DRSA, let us mention that the dominancebased rough approximations can also serve to induce decision trees representing knowledge discovered from preference-ordered data [3].

11 54 R. S lowiński, S. Greco, and B. Matarazzo 4 Extensions of DRSA Dealing with Preference-Ordered Data 4.1 Rough Approximation of Preference Relations and Decision Rule Preference Model It is natural that people make decisions and then search for rules that justify their choices. The rules make evidence of decision policyand can be used for both explanation of past decisions and recommendation of future decisions. The set of rules representing decision policyof a decision maker (DM) is called preference model. It is a necessarycomponent of decision support systems for multicriteria choice and ranking problems. Classically, it has been a utility function or a binaryrelation its construction requires some preference information from the DM, like substitution ratios among criteria, importance weights, or indifference, preference and veto thresholds [23]. Acquisition of this preference information from the DM is not easyand, moreover, the resulting preference model is not intelligible for the DM. In this situation, the preference model in terms of decision rules induced from decision examples provided bythe DM has two advantages over the classical models: (i) it is intelligible and speaks the language of the DM, (ii) the preference information comes from observation of DM s decisions. There is, however, a problem with inconsistencyoften present in the set of decision examples. Rather than correct or ignore these inconsistencies, we propose to take them into account in the preference model construction using the rough set concept. We have extended DRSA in order to approximate comprehensive preference relations in multicriteria choice and ranking problems [5, 6, 10]. In particular, DRSA has been adapted to analysis of, so-called, pairwise comparison tables (PCT), where each row corresponds to a pair of objects described bybinary relations on particular criteria and bya comprehensive preference relation, e.g. the outranking relation. Using DRSA to the analysis of the PCT, we obtain rough approximation of the outranking relation bydominance relation. Decision rules derived from rough approximations maythen be applied to a new set of objects concerned by the choice or ranking problem. As a result, one obtains a four-valued outranking relation on this set. In order to obtain a recommendation, it is advisable to use an exploitation procedure based on the net flow score of the objects. The preference model in terms of decision rules has several advantages over the classical models: the decision rules do not convert ordinal information into numeric one but keep the ordinal character of input data due to the syntax proposed; in this sense, DRSA is concordant with the paradigm of computing with words which are hardlyconvertible to numerical scales, heterogeneous information (qualitative and quantitative, ordered and nonordered) and scales of preference (ordinal, cardinal) can be processed within

12 Rough Set Analysis of Preference-Ordered Data 55 the DRSA, while classical methods consider onlyquantitative ordered evaluations with rare exceptions, the decision rule preference model resulting from the DRSA can represent even inconsistent preferences. We proved the equivalence of preference representation bya general nonadditive and non-transitive utilityfunction and by if..., then... decision rules [12, 14]. Moreover, some well known multicriteria aggregation procedures (lexicographic aggregation, majorityaggregation, ELECTRE I and TACTIC) were represented in terms of the decision rule model; in these cases the decision rules decompose the synthetic aggregation formula used by these procedures; the rules involve partial profiles defined for subsets of criteria plus a dominance relation on these profiles and pairs of actions. Such decomposition makes the preference model more understandable for the decision maker. If the comprehensive outranking relation is a complete preorder, i.e. it is stronglycomplete (for all objects x,y, x outranks y or y outranks x) and transitive, it can be represented bya utilityfunction. We proved that a utilityfunction is equivalent to a set of certain decision rules, either D -decision rules or D - decision rules. We proved, moreover, that the Sugeno integral, considered to be the most general form of the max-min ordinal utilityfunction, can be represented bya set of certain decision rules having a veryspecific syntax (single graded decision rules). The capacityof representation of preferences byset of decision rule is thus far more general than the Sugeno integral [25]. 4.2 Missing Values of Attributes and Criteria In practical applications, the data table is often not complete because some data are missing. To deal with this case, we proposed in [8] an extension of the rough set methodologyto the analysis of incomplete data tables. The extension concerns both the classical rough set approach (CRSA) based on the use of indiscernibilityrelations and the DRSA. The relations of indiscernibilityor dominance between two objects are considered as directional statements where a subject is compared to a referent object. We require that the referent object has no missing data. The two extended rough set approaches boil down to the original approaches when there are no missing data. The rules induced from the newlydefined rough approximations defined are either exact or probabilistic, depending whether theyare supported byconsistent objects or not. The wayof handling the missing values in the proposed approach seems faithful with respect to available data because the decision rules are robust in the sense of being grounded on objects existing in the data set and not on hypothetical objects created by putting some possible values instead of the missing ones. 4.3 Fuzzy Set Extension of DRSA In [9] and in [4], we extended and characterized DRSA byusing fuzzy dominance and similarity relations considered jointly. We proved that our extension

13 56 R. S lowiński, S. Greco, and B. Matarazzo of the rough approximation into the fuzzycontext maintains the same desirable properties of the crisp rough approximation of decision classes. In this generalization, we distinguished all possible cases where either approximating granules in X C (dominance cones in X C ) are fuzzy, or approximated granules in X D (preference-ordered decision classes) are fuzzy, or both these granules are fuzzy. 4.4 DRSAfor Decisions under Risk In [11] we opened a new avenue for applications of the rough set concept to analysis of preference-ordered data. We considered the classical problem of decision under risk extending DRSA byusing stochastic dominance. We considered the case of traditional additive probabilitydistribution over the set of states of the world, however, the model is rich enough to handle non-additive probability distributions and even qualitative ordinal distributions. The rough set approach gives a representation of DM s preferences under risk in terms of if..., then... decision rules induced from rough approximations of sets of exemplarydecisions (preference ordered classification of acts described in terms of outcomes in uncertain states of the world). 4.5 Hierarchical Structure of Attributes and Criteria In manyreal life situations, the process of decision-making is decomposable into sub-problems; this decomposition mayeither follow from a, naturally, hierarchical structure of the evaluation, or from a need of simplification of a complex decision problem. These situations are referred to call Hierarchical Decision Problems. The hierarchical structure of a problem has the form of a tree whose nodes are attributes and criteria describing objects. In [2], we are considering hierarchical decision problems where the decision is made in finite number of steps due to hierarchical structure of regular attributes and criteria. We propose a methodologybased on decision rule preference model induced from examples of hierarchical decisions made bythe decision maker on a reference set of objects. To deal with inconsistencies appearing in decision examples we adapt the rough set approach to the hierarchical classification problems (HCP). In HCP, the main difficultyconsists in propagation of inconsistencies along the tree, i.e. taking into account at each node of the tree the inconsistent information coming from lower level nodes. In the proposed methodology, the inconsistencies are propagated from the bottom to the top of the tree in the form of subsets of possible attribute values. In the case of hierarchical criteria, these subsets are intervals of possible criterion values. Subsets of possible values mayalso appear in leafs of the tree, i.e. in evaluations of objects bythe lowest-level attribute and criteria. To deal with multiple values of attributes for object description the classical rough set approach and DRSA have been adapted adequately.

14 Rough Set Analysis of Preference-Ordered Data 57 5 Induction of Association Rules with Prior Knowledge of Type (i) and (iii) Problems of discovering association rules in preference-ordered data sets have been considered in [17]. Such data are typically related to economic issues, like finance or marketing. We introduced a specific form of association rules involving criteria. Discovering such rules requires new concepts: semantic correlation of criteria, inconsistencyof objects with respect to the dominance robust items, credibilityindex. Properties of these rules concerning their generalityand interdependencies were studied in view of constructing an algorithm for mining such rules. The algorithm is an extension of the algorithm proposed in [1]. The approach can be combined with methods handling missing or imprecise values of attributes and criteria. In case of association rules induced with prior knowledge of type (i) and (iii), the rules have the same syntax as in case of prior knowledge of type (i), (ii) and (iii), however, there is a joint attribute space X Q and the granules corresponding to condition and decision profiles are defined in disjoint sub-spaces of X Q. 6 Conclusions Knowledge discoveryfrom preference-ordered data differs from usual knowledge discoverysince the former involves preference orders in domains of attributes and in the set of decision classes. This requires that a knowledge discoverymethod applied to preference-ordered data respects the dominance principle. As this is not the case for the well-known methods of data mining and knowledge discovery, they are not able to discover all relevant knowledge contained in the analysed data sample and, even worse, theymayyield unreasonable discoveries, because inconsistent with the dominance principle. These deficiencies are repaired in DRSA based on the concept of rough approximations consistent with the dominance principle. DRSA permits, moreover, to applyrough set approach to some new fields, like multicriteria decision making and decision under uncertainty. Multiple extensions proposed for DRSA make of this approach a useful tool for practical applications. Acknowledgement. The first author wishes to acknowledge financial support from the State Committee for Scientific Research, KBN research grant no. 8T11F , and from the Foundation for Polish Science, subsidyno. 11/2001. The research of the two other authors has been supported bythe Italian Ministryof Education, Universityand Scientific Research (MIUR). References 1. Agrawal, R., Mannila, H., Srikant, R., Toivinen, H., Verkamo, I.: Fast discovery of association rules. [In]: U.M.Fayyad et al. (eds.), Advances in Knowledge Discovery and Data Mining. AAAI Press, 1996, pp

15 58 R. S lowiński, S. Greco, and B. Matarazzo 2. Dembczynski, K., Greco, S., Slowinski, R.: Methodology of rough-set-based classification and sorting with hierarchical structure of attributes and criteria. Control & Cybernetics 31 (2002) (to appear) 3. Giove, S., Greco, S., Matarazzo, B. and Slowinski, R.: Variable consistency monotonic decision trees. [In]: Proceedings RSCTC 2002, (in this volume) 4. Greco, S., Inuiguchi, M., Slowinski, R.: Dominance-based rough set approach using possibility and necessity measures. [In]: Proceedings RSCTC 2002, (in this volume) 5. Greco, S., Matarazzo, B., Slowinski, R.: A new rough set approach to evaluation of bankruptcy risk. [In]: C.Zopounidis (ed.), Operational Tools in the Management of Financial Risk. Kluwer Academic Publishers, Boston, 1998, pp Greco, S., Matarazzo, B., Slowinski, R.: The use of rough sets and fuzzy sets in MCDM. Chapter 14 [in]: T.Gal, T.Stewart, T.Hanne (eds.), Advances in Multiple Criteria Decision Making. Kluwer Academic Publishers, Boston, 1999, pp Greco, S., Matarazzo, B., Slowinski, R.: Rough approximation of preference relation by dominance relations. European Journal of Operational Research, 117 (1999) Greco, S., Matarazzo, B., Slowinski, R.: Dealing with missing data in rough set analysis of multi-attribute and multi-criteria decision problems. [In]: S.H.Zanakis, G.Doukidis and C.Zopounidis (eds.), Decision Making: Recent Developments and Worldwide Applications. Kluwer Academic Publishers, Boston, 2000, pp Greco, S., Matarazzo, B., Slowinski, R.: Fuzzy extension of the rough set approach to multicriteria and multiattribute sorting. [In]: J. Fodor, B. De Baets and P. Perny (eds.), Preferences and Decisions under Incomplete Knowledge, Physica- Verlag, Heidelberg, 2000, pp Greco, S., Matarazzo, B., Slowinski, R.: Rough sets theory for multicriteria decision analysis. European Journal of Operational Research 129, 1 (2001) Greco, S., Matarazzo, B., Slowinski, R.: Rough set approach to decisions under risk. [In]: W.Ziarko, Y.Yao (eds.): Rough Sets and Current Trends in Computing, LNAI 2005, Springer-Verlag, Berlin, 2001, pp Greco, S., Matarazzo, B., Slowinski, R.: Conjoint measurement and rough set approach for multicriteria sorting problems in presence of ordinal criteria. [In]: A.Colorni, M.Paruccini, B.Roy (eds.), A-MCD-A: Aide Multi Critère à la Décision Multiple Criteria Decision Aiding, European Commission Report EUR EN, Joint Research Centre, Ispra, 2001, pp Greco, S., Matarazzo, B., Slowinski, R.: Rough sets methodology for sorting problems in presence of multiple attributes and criteria. European J. of Operational Research 138, 2 (2002) Greco, S., Matarazzo, B., Slowinski, R.: Preference representation by means of conjoint measurement and decision rule model. [In]: D.Bouyssou, E.Jacquet- Lagrèze, P.Perny, R.Slowinski, D.Vanderpooten, Ph.Vincke (eds.), Aiding Decisions with Multiple Criteria Essays in Honor of Bernard Roy. Kluwer Academic Publishers, Boston, 2002, pp Greco, S., Matarazzo, B., Slowinski R.: Multicriteria classification by dominancebased rough set approach, Chapter [in]: W.Kloesgen, J.Zytkow (eds.), Handbook of Data Mining and Knowledge Discovery, Oxford University Press, New York, 2002 (to appear)

16 Rough Set Analysis of Preference-Ordered Data Greco, S., Matarazzo, B., Slowinski, R., Stefanowski, J.: Variable consistency model of dominance-based rough set approach. [In]: W.Ziarko, Y.Yao: Rough Sets and Current Trends in Computing, LNAI 2005, Springer-Verlag, Berlin, 2001, pp Greco, S., Matarazzo, B., Slowinski, R., Stefanowski, J.: Mining association rules in preference-ordered data. Proc. 13th Int. Symposium on Methodologies for Intelligent Systems (ISMIS). Lyon, June 26 29, 2002, Springer Verlag, Berlin, 2002 (to appear) 18. Grzymala-Busse, J.W., Zou, X.: Classification strategies using certain and possible rules. [In]: L.Polkowski, A.Skowron (eds.), Rough Sets and Current Trends in Computing. LNAI 1424, Springer-Verlag, Berlin, 1998, pp Michalski, R., Bratko, I., Kubat, M. (eds.): Machine Learning and Data Mining. John Wiley & Sons, Chichester, Pawlak, Z.: Rough Sets. Theoretical Aspects of Reasoning about Data. Kluwer Academic Publishers, Dordrecht, Pawlak, Z., Grzymala-Busse, J.W., Slowinski, R., Ziarko, W.: Rough sets. Communications of the ACM 38, 11 (1995) Polkowski, L., Skowron, A.: Calculi of granules based on rough set theory: approximate distributed synthesis and granular semantics for computing with words. [In]: N.Zhong, A.Skowron, S.Ohsuga (eds.), New Directions in Rough sets, data Mining, and Soft-Granular Computing. LNAI 1711, Springer-Verlag, Berlin, 1999, pp Roy, B.: Méthodologie Multicritère d Aide à la Décision. Economica, Paris, Slowinski, R. (ed.): Intelligent Decision Support. Handbook of Applications and Advances of the Rough Sets Theory. Kluwer Academic Publishers, Boston, Slowinski, R., Greco, S., Matarazzo, B.: Axiomatization of utility, outranking and decision-rule preference models for multiple-criteria classification problems under partial inconsistency with the dominance principle. Control & Cybernetics 31 (2002) (to appear) 26. Slowinski, R., Stefanowski, J., Greco, S., Matarazzo, B.: Rough sets based processing of inconsistent information in decision analysis. Control & Cybernetics 29, 1 (2000) Slowinski, R., Zopounidis, C.: Application of the rough set approach to evaluation of bankruptcy risk. Intelligent Systems in Accounting, Finance and Management 4 (1995) Stefanowski J.: On rough set based approaches to induction of decision rules. In Polkowski L., Skowron A. (eds.), Rough Sets in Data Mining and Knowledge Discovery, Physica-Verlag, vol.1, Heidelberg, 1998, pp Ziarko, W.: Rough sets as a methodology for data mining. [In]: L.Polkowski, A.Skowron (eds.), Rough Sets in Knowledge Discovery. Vol. 1, Physica-Verlag, Heidelberg, 1998, pp Ziarko, W., Shan, N.: KDD-R, a comprehensive system for knowledge discovery in databases using rough sets. [In]: Lin, T.Y, Wildberg, A.M., (eds.), Soft Computing: Rough Sets, Fuzzy Logic, Neural Networks, Uncertainty Management, Knowledge Discovery, Simulation Council Inc., San Diego, 1995, pp

An algorithm for induction of decision rules consistent with the dominance principle

An algorithm for induction of decision rules consistent with the dominance principle Salvatore Greco 1, Benedetto Matarazzo 1, Roman Slowinski 2, Jerzy Stefanowski 2 1 Faculty of Economics, University