Rough Set Theory. Andrew Kusiak Intelligent Systems Laboratory 2139 Seamans Center The University of Iowa Iowa City, Iowa

Rough Set Theory Andrew Kusiak 139 Seamans Center Iowa City, Iowa 54-157 Iowa City Tel: 319-335 5934 Fa: 319-335 5669 andrew-kusiak@uiowa.edu http://www.icaen.uiowa.edu/~ankusiak Benefits Evaluation of the importance of features Reduction of redundant objects and features Determination of minimal subsets of features ensuring satisfactory classification of objects Creation of models for objects in various decision classes Results in the form close to natural language (decision rules) 1

Definition Information system is a 4-tuple S = U, Q, V, ρ, where U is a finite set of objects, Q is a finite set of features, V= UVq, where Vq is a domain of feature q, q Q and ρ: U Q V is a function such that ρ(, q) V for every q Q, U, called information function. q Definition The information system is a finite data table, columns of which are features, rows are objects and each entry in column q and row has a value. Each row in the table represents the information about an object in S.

Indiscernibility Relation Let S = U, Q, V, ρ be an information system and let P Q,, y U. We say that and y are indiscernible by the set of features P in S (notation Py ~ ) if r (, q) = r( y, q) for every q P. ~ P Equivalence classes of relation are called P-elementary sets in S. Q-elementary sets are called atoms in S. Eample Q U p Q r s 1 1 1 1 1 3 1 4 1 5 1 6 1 7 1 1 8 1 1 9 1 1 1 1 3

Eample Atoms and {p}-elementary sets in the information system from the table are as follows: Q A = {p}-elementary sets U = {3, 5, 6, 9, 1} 1 X = {1, 8} = {, 4, 7} 3 Atoms (Q-elementary sets) Z 1 ={1, 8} Z 5 ={4} Z ={, 7} Z 6 ={1} Z 3 ={3, 6} Z 4 ={5, 9} p q r s 1 1 1 1 1 4 1 5 1 6 1 7 1 1 8 1 1 9 1 1 1 1 Approimation of Sets Let P Q and Y U. The P - lower approimation of Y, denoted by PY, and the P - upper approimation of Y, denoted by PY, are defined as: PY * = UX{ X P and X Y} * PY = U X{ X P andx Y } The P - boundary (P doubtful region of classification) is defined as Bn p ( Y ) = PY PY 4

Approimation of Sets Lower Approimation: For a given concept, its lower approimation refers to the set of observations that can all be classified into this concept. Upper Approimation: For a given concept, its upper approimation refers to the set of observations that can be possibly classified into this concept. Approimation Accuracy With every subset Y U, we can associate an accuracy of approimation of set Y by P in S, or in short, accuracy of Y, defined as card( PY ) μ p ( Y ) = card( PY ) 5

Eample Revisited Table 1 Q U p q r s 1 1 1 1 1 3 1 4 1 5 1 6 1 7 1 1 8 1 1 9 1 1 1 1 Consider the information system of Table 1. Let Y = {1, 4, 5, 7, 9} and P = Q = {p, q, r, s}. The approimations are: QY = Z4 + Z5 = ={5, 9}+ {4} = {4, 5, 9} Q Y = Z1 + Z + Z4 + Z5 = {1,, 4, 5, 7, 8, 9} Bn Q (Y ) Z 1 ={1, 8} Z ={, 7} = Z1 + Z ={1, 8} + {, 7} and the accuracy is μ Q (Y) = 3/7 =.49 Definition The basic construct in rough set theory is called a reduct It is defined as a minimal sufficient subset of features RED A such that: Relation R(RED) = R(A), i.e., RED produces the same categorization of objects as the collection A of all features, and For any g RED, R (RED - {g})r(a), i.e., a reduct is a minimal subset of features with respect to the first property 6

Core Core (A) = Red(A) The intersection of all reducts of A is a core Eight Objects Eample F1 F F3 F4 D 1 1 1 1 1 3 1 1 4 1 5 1 1 6 1 1 7 1 3 8 3 1 1 7

Reducts and Core F1 F F3 F4 D 1 1 1 1 1 3 1 1 4 1 5 1 1 6 1 1 7 1 3 8 3 1 1 Reducts: F1, F3, F4 CORE = Empty set Feature Classification Quality F1 F F3 F4 D 1 1 1 1 1 3 1 1 4 1 5 1 1 6 1 1 7 1 3 8 3 1 1 CQ (F1) = 1% CQ (F3) = 1% CQ (F4) = 1% CQ (F) = % 8

Question What is the classification quality of a reduct? 1% by the definition Eample Object No. F1 F F3 F4 D 1 1 1 1 3 1 4 1 1 1 5 1 3 Data set Object No. o-reduct No. F1 F F3 F4 D 1 1 3 3 4 1 5 5 6 3 4 7 1 8 1 Minimum number of features representation (Min length rules) 9

Eample Object No. F1 F F3 F4 D 1 1 1 1 3 1 4 1 1 1 5 1 3 Data set Object No. o-reduct-no. F1 F F3 F4 D 1 4 7 1 3,5 3,5 8 1 Min number of rules representation Sample Decision Rules Data set F1 F F3 F4 1 1 1 1 1 3 1 1 4 1 5 1 1 6 1 1 7 1 3 8 3 1 1 Rule 1. IF (F1 = ) THEN (D = 1); [4, 1.%][, 5, 6, 8] Rule. IF (F1 = 1) THEN (D = ); [4, 1.%] [1, 3, 4, 7] 1

Eample Continuous feature value data set F1 F F3 F4 O 1 1..5.98.3.3 3.4 1.4.3 1 3.99.95 3.4 1.97 4.3.5 3.11 3.1 5.3 1.97.96. 1 6.4.5 1.4 1.4 1 7.99 3.4 1.4 1.4 8 1..97.94 3.1 1 Reducts and Core Continuous feature value data set After discretization F1 F F3 F4 D 1 1..5.98.3.3 3.4 1.4.3 1 3.99.95 3.4 1.97 4.3.5 3.11 3.1 5.3 1.97.96. 1 6.4.5 1.4 1.4 1 7.99 3.4 1.4 1.4 8 1..97.94 3.1 1 Reducts F1, F, F3 F, F3, F4 Core: F, F3 11

Decision Rules 1 F1 F F3 F4 D 1 1..5.98.3.3 3.4 1.4.3 1 3.99.95 3.4 1.97 4.3.5 3.11 3.1 5.3 1.97.96. 1 6.4.5 1.4 1.4 1 7.99 3.4 1.4 1.4 8 1..97.94 3.1 1 Rule 1. (IF F1 in [.35,.515]) THEN (D = 1); [1, 5%][6] Rule. (IF F1 in [.3,.35]) THEN (D = 1); [1, 5%][5] Rule 3. (IF F in [.96, 1.47]) THEN (D = 1); [1, 5%][8] Rule 4. (IF F4 in [.5,.5]) AND (A3 in [1,.1]) THEN (D = 1); [1, 5%][] Rule 5. (IF F1 in [.515, 1.5]) THEN (D = ); [, 5%][3,7] Rule 6. (IF F in [.1,.545]) THEN (D = ); [1, 5%][4] Rule 7. (IF F3 in [.1, 3.1]) THEN (D = ); [1, 5%][1] Decision Rules F1 F F3 F4 D 1 1..5.98.3.3 3.4 1.4.3 1 3.99.95 3.4 1.97 4.3.5 3.11 3.1 5.3 1.97.96. 1 6.4.5 1.4 1.4 1 7.99 3.4 1.4 1.4 8 1..97.94 3.1 1 Rule 1. IF (F <=.55) AND(F3 <=.1) THEN (D = 1); [3, 75%][5, 6, 8] Rule. IF (F1 >= 1.55) AND (F3 <=.75) THEN (D = 1); [1, 5%][] Rule 3. IF (F1 in [.515, 1.5]) THEN (D = ); [, 5%] [3, 7] Rule 4. IF (F3 >=.1) THEN (D = ); [3, 75%] [1, 3, 4] 1

Types of Rules Eact One set of conditions implies one outcome Approimate One set conditions implies more than one outcome Question Are the two rule eact or approimate? Rule 1. IF (F <=.55) AND (F3 <=.1) THEN (D = 1) Rule. IF (F1 >= 1.55) AND (F3 <=.75) THEN (D = 1) Eact! 13

Rule Accuracy - Coverage Relationship Coverage Accuracy Rule antecedent length Eample Eight objects, each with four features and outcome D Object No. F 1 F F 3 F 4 D 1 1 1 1 1 3 1 1 4 1 5 1 1 6 1 1 7 1 3 8 3 1 1 14

More about Reducts Single feature reducts for objects 1 and Object No. F 1 F F 3 F 4 D 1 1 1 1 1 3 1 1 4 1 5 1 1 6 1 1 7 1 3 8 3 1 1 1 Object 1 1 Object 1 1 1 Reduct Generation Algorithm (Based on Pawlak 1991) Step. Initialize object number i = 1. Step 1.Select object i and find a set of reducts with one feature only. If found, go to Step 3, otherwise go to Step. Step. For object i, find an reduct with m - 1 features, where m is the number of input features. This step is accomplished by deleting one feature at a time. Step 3. Set i = i + 1. If all objects have been considered, Stop; otherwise go to Step 1. 15

RG Algorithm The reduct generation algorithm enumerates reducts with one or m - 1 features only Considering k out of m features would result in m!/k!(m - k)! reducts Eample Object No. F 1 F F 3 F 4 O 1 1 1 3 3 1 1 1 1 4 1 1 5 1 16

All reducts for object 1 Object No. F 1 F F 3 F 4 D Reduct No. 1 1 1 3 4 5 6 7 8 9 1 F 1 F F 3 F 4 D 1 1 1 1 1 All reducts for object Object F 1 F F 3 F 4 O Reduct F 1 F F 3 F 4 O No. No. 1 3 11 3 1 13 14 15 16 17 18 1 1 1 1 3 3 3 3 3 17

Eample: Decision Table Object Condition Features Decision Features X Color Windows Pentium Ecellent Color Windows 486 Ecellent Information system: Definition S = < U, Q, V, f > Information System Components Object Condition Features Decision X Color Windows Pentium Ecellent Color Windows 486 Ecellent Information System S = < U, Q, V, f > Universe U = { 1,,..., 8 } Set of features Q = {q 1, q, q 3, q 4 } = {MONITOR, OS, CPU, d} Domain of the features V = U q Q Vq Decision function f : U Q V 18

Attribute Domain Object Condition Features Decision X Color Windows Pentium Ecellent Color Windows 486 Ecellent Domain of the features V = U q Q Vq Vq = {Color, Monochrome}; Vq = {DOS, Windows}; 1 Vq 3 = {386, 486, Pentium} ; Vq 4 = {Poor, Good, Ecellent}; Eample Concepts Object Condition Features Decision X Color Windows Pentium Ecellent Color Windows 486 Ecellent X = {Inefficient PCs configuration} = {, } 1 3 5 X = {PCs with good price/performance relationship} = {, 4 } X = {Obsolete PCs} = { 3 5, 8 } 19

Eample Classification Object Condition Features Decision X Color Windows Pentium Ecellent Color Windows 486 Ecellent A*= {CPU, d} = { = { 1, 6 }, X = { }, = { 3 }, = { 4 }, = { 5 }, = { 7 }, = { 8 }} ; The following partitions of U grouped in the equivalence classes: {, X,..., } have been obtained Eample: Equivalence Class Description Object Condition Features Decision X Color Windows Pentium Ecellent Color Windows 486 Ecellent Des A ( ) = {(q 1, Color), (q, DOS), (q 3, 486), (q 4, Good)} = {(q 1 = Color), (q = DOS), (q 3 = 486), (q 4 = Good)}

Indiscernibility Relation 1 Object Condition Features Decision X Color Windows Pentium Ecellent Color Windows 486 Ecellent Given A = {MONITOR}, the indiscernibility relation IND (A), is determined as follows: f ( 1, q 1 ) = f (, q 1 ) = f ( 4, q 1 ) = f ( 5, q 1 ) = f ( 8, q 1 ) = {Color}; f ( 3, q 1 ) = f ( 6, q 1 ) = f ( 7, q 1 ) = {Monochrome}; The above relation generates the following equivalence classes: XThe 1 = University { 1,, of Iowa 4, 5, 8 }, X = { 3, 6, 7 } Equivalence Class Description 1 Object Condition Features Decision X Color Windows Pentium Ecellent Color Windows 486 Ecellent The description of each equivalence class is as follows: Des A( ) = {( q 1, Color)} = {( q 1 = Color)}; Des A(X ) = {( q 1, Monochrome)} = {( q 1 = Monochrome)}; 1

Classification 1 Object Condition Features Decision X Color Windows Pentium Ecellent Color Windows 486 Ecellent The family of equivalence classes {, X } partitions the objects universe into two disjoint groups: PCs having Color Monitors and PCs having Monochrome Monitors, forming the following classification U/IND (A) = A * = { = { 1,, 4, 5, 8}, X = { 3, 6, 7}}. Indiscernibility relation Object Condition Features Decision X Color Windows Pentium Ecellent Color Windows 486 Ecellent Given the feature set A = {MONITOR, CPU}, the following objects form indiscernibility relation IND (A): f {( 1, q 1 ), ( 1, q 3 )} = f {( 4, q 1 ), ( 4, q 3 )} = {Color, 486}; f {(, q 1 ), (, q 3 )} = {Color, Pentium}; f {( 3, q 1 ), ( 3, q 3 )} = f {( 7, q 1 ), ( 7, q 3 )} = {Monochrome, Pentium}; f {( 5, q 1 ), ( 5, q 3 )} = f {( 8, q 1 ), ( 8, q 3 )} = {Color, 386}; f {( 6, q 1 ), ( 6, q 3 )} = {Monochrome, 486};

Equivalence Class Description Object Condition Features Decision X Color Windows Pentium Ecellent Color Windows 486 Ecellent Des A( ) = {(q 1, Color), (q 3, 486)} = {(q 1 = Color), (q 3 = 486)}; Des A(X ) = {(q 1, Color), (q 3, Pentium)} = {(q 1 = Color), (q 3 = Pentium)}; Des A( ) = {(q 1, Monochrome), (q 3, Pentium)} = {(q 1 = Monochrome), (q 3 = Pentium)}; Des A( ) = {(q 1, Color), (q 3, 386)} = {(q 1 = Color), (q 3 = 386)}; Des A( ) = {(q 1, Monochrome), (q 3, 486)} = {(q 1 = Monochrome), (q 3 = 486)}; Classification Object Condition Features Decision X Color Windows Pentium Ecellent Color Windows 486 Ecellent The family of equivalence classes {, X,,, } partitions the objects universe into five disjoint groups, for the considered feature set A = {MONITOR, CPU}: {Color, 486}, {Color, Pentium}, {Monochrome, Pentium}, {Color, 386 }and {Monochrome, 486}, forming the following classification U/IND(A) = A * = { = { 1, 4}, X = { }, = { 3, 7}, = { 5, 8}, = { 6}}. 3

Approimations: Eample 1 Object Condition Features Decision X Color Windows Pentium Ecellent Color Windows 486 Ecellent For the set of features A = {MONITOR}, and objects X = { 1,, 3, 4 } determine A-lower, A-upper approimations and A-boundary region Eample 1 A = {MONITOR} X = { 1,, 3, 4 } The following equivalence classes are defined for the set of features A = {MONITOR}: [ 1 ] A = { 1,, 4, 5, 8 }, [ ] A = { 3, 6, 7 }. The A-lower approimation is given by: AX = { U: [] A X}, the A-upper approimation is given by: À X = { U: [] A X } and the A-boundary region is given by: BN A (X) = À X - AX, then: Union A-lower approimation = AX = (empty set). A-upper approimation = À X = [ 1 ] A [ ] A = { 1,, 3, 4, 5, 6, 7, 8 } = U. A-boundaryregion = BN A (X) = U - = U. 4

Eample 1 X = { 1,, 3, 4 } Object Condition Features Decision Attribute X Color Windows Pentium Ecellent Color Windows 486 Ecellent Card (AX) = Card BNA(X) = 8 - = 8 Card (AX) = 8 Accuracy μq (X) = /8 = Eample Object Condition Features Decision Attribute X Color Windows Pentium Ecellent Color Windows 486 Ecellent For the set of features A = {MONITOR, OS}, and objects X = { 1,, 3, 6 } determine A-lower, A-upper approimations and A-boundary region 5

AX = { U: [] A X} À X = { U: [] A X } Eample A = {MONITOR, OS}, X = { 1,, 3, 6 } The following equivalence classes are determined for this particular set of features A: [ 1 ] A = { 1, 5 }, [ ] A = {, 4, 8 }, [ 3 ] A = { 3 }, [ 4 ] A = { 6, 7 }, then: A-lower approimation = AX = { 3 }; A-upper approimation = À X = [ 1 ] A [ ] A [ 3 ] A [ 4 ] A = { 1,, 3, 4, 5, 6, 7, 8 } = U. A-boundaryregion = BN A (X ) = U -{ 3 } = { 1,, 4, 5, 6, 7, 8 }. L Appro Eample Object Condition Features Decision Attribute X Color Windows Pentium Ecellent Color Windows 486 Ecellent Card (AX) = 1 Card BNA(X) = 8-1 = 7 Card (AX) = 8 Accuracy μq (X) = 1/8 =.15 6

Eample 3 Object Condition Features Decision Attribute X Color Windows Pentium Ecellent Color Windows 486 Ecellent For the set of features A= {MONITOR, OS, CPU}, and objects X = { 1,, 4, 6 } A-lower, A-upper approimation and A-boundary region AX = { U: [] A X} Eample 3 À X = { U: [] A X A = {MONITOR, OS, CPU}, X = { 1,, 4, 6 }. The following equivalence classes are determined for this particular set of features A: [1]A = {1}, []A = {}, [3]A = {3}, [4]A = {4}, [5]A = {5}, [6]A = {6}, [7]A = {7}, [8]A = {8} so: A-lower approimation = AX = {1,, 4, 6} A-upper approimation = À X = [ 1]A []A [4]A [6]A = { 1,, 4, 6} A-boundary region = BNA(X) = {1,, 4, 6} - {1,, 4, 6} =. 7

Eample 3 L Appro Object Condition Features Decision Attribute X Color Windows Pentium Ecellent Color Windows 486 Ecellent Card (AX) = 4 Card BNA(X) = 4-4 = Card (AX) = 4 Accuracy μq (X) = 4/4 = 1 Classification Accuracy Revisited Accuracy μa (X) = Card (AX) Card (AX) Given the set of objects X = {X1, X, X4, X6} and the subset of features A = {Monitor, OS, CPU} The accuracy of approimation of set X by the set of features A is μa (X) = 4/4 = 1 (for short the accuracy of X) ρa (X) = 1 - μa (X) is called the A-roughness of the set A (A-roughness for short) 8

U Q Eample c1 c d 1 3 1 1 4 1 5 1 1 6 1 1 1 7 8 1 9 1 1 1 Classification Accuracy Consider equivalence classes created on basis of (c1, c) The A ={c1, c}-elementary sets are as follows: { Q 1, }, { 3, 4 }, { 5 }, { 6 }, { 7, 8 }, { 9, 1 }. U Set of objects { 3, 5, 6, 8 } X = with d = 1 AX = { 5 } { 6 } AX={ 5 } { 6 } { 3, 4 } { 7, 8 } μa( X)= /6 =.33 c 1 c d 1 3 1 1 4 1 5 1 1 6 1 1 1 7 8 1 9 1 1 1 9

Classification Quality Classification Quality = Lower approimation/ Number of eamples in a given decision class Entire Data Set Classification accuracy = Number of objects in all lower approimations / Number of objects in all upper approimations Classification quality = Number of objects in all lower approimations/ Number of all objects in the data set 3

Quality Loss Assume the set of features P = {a, b, c}, and Feature a is removed from P Quality Loss = Classification Quality (P) - Classification Quality (P -{ a }) Quality Gain Assume the set of features P = {a, b, c}, and Feature d is added to P Quality Gain = Classification Quality (P + {d }) - Classification Quality (P) 31