Classification Based on Logical Concept Analysis

Classification Based on Logical Concept Analysis Yan Zhao and Yiyu Yao Department of Computer Science, University of Regina, Regina, Saskatchewan, Canada S4S 0A2 E-mail: {yanzhao, yyao}@cs.uregina.ca Abstract. This paper studies the problem of classification by using a concept lattice as a search space of classification rules. The left hand side of a classification rule is composed by a concept, including its extension and its intension, and the right hand side is the class label that the concept implies. Particularly, we show that logical concepts of the given universe are naturally associated with any consistent classification rules generated by any partition-based or covering-based algorithm, and can be characterized as a special set of consistent classification rules. An algorithm is proposed to find a set of the most general consistent concepts. 1 Introduction The objectives of classification tasks can be divided into description and prediction. Description focuses on the discovery of rules that describe data, and prediction involves the use of discovered rules to make prediction. A classification rule is normally expressed in the form of if φ then ψ, or symbolically, φ ψ. The left hand side is a formula that characterizes a subset of the objects, and the right hand side is a label that indicates the class of this set of objects. Generally, a classification task can be understood as a search in a particular search space of possible solutions. The features of a search space determine the properties of the rules to be constructed; the structure and the complexity of the search space are primary measures of the difficulty of a classification problem. Given a particular search space, different search strategies, such as depth-first, breath-first and best-first methods, together with some heuristics can be used to explore the normally very large space [6, 14]. Many search spaces for classification tasks have been intensively studied. For example, a version space [6] has the most specific bound and the most general bound, such that the most specific bound contains the set of maximally specific formulas with respect to some training data, and the most general bound contains the set of maximally general formulas with respect to some other training data. It allows the general-to-specific and the specific-to-general breadth-first search at the same time. The left hand side of classification rules are all possible generalizations that could be created from these two bounding sets. As another example, a granule network [14] systematically organizes all the granules and formulas with respect to the given universe. Each node consists of

a granule which is a subset of objects in the universe, and each arc leading from a granule to its child is labelled by an atomic formula. A path from a coarse granule to a fine granule indicates a conjunctive relation. The left hand side of a classification rule is a disjunction of a conjunctive set of atomic formulas. A clustering-based classifier presents another search space. For example, for a k-nn classifier [5], based on some pre-selected distance metric, k clusters are constructed. Each is assigned a particular class. The left hand side of a classification rule is a disjunction of a set of clusters. The problem of this search space is that using a relatively large k may include some not so similar pixels. On the other hand, using a very small k may exclude some potential accurate rules. The optimal value of k depends on the size and the nature of the data. This paper intends to introduce another search space, a concept lattice, for classification tasks. As a result, the left hand side of a classification rule is a concept, including a set of objects (an extension) and a set of properties (an intension). There are several advantages of using concept analysis for classification. Concepts are extremely precise in the sense that an intention and an extension are two-way definable. This ensures that the constructed concept-based rules are most descriptive and accurate. All the concepts are naturally organized into a concept hierarchy. Once concepts are constructed and described, one can study relationships between concepts in terms of their intensions and extensions, such as sub-concepts and super-concepts, disjoint and overlap concepts, and partial sub-concepts. These relationships can be conveniently expressed in the form of rules and associated with quantitative measures indicating the strength of rules. Knowledge discovery and data mining, especially rule mining, can be viewed as a process of forming concepts and finding relationships between concepts in terms of intensions and extensions [12, 13]. The rest of the paper is organized as follows. Section 2 formalizes the basic settings of information tables and a decision logic language. After that, the notion of formal concepts and one of its logical transformations are discussed in Section 3. Section 4 studies the relationship between consistent classification rules and consistent concepts, and proposes a heuristic method to explore the most general consistent concepts. Conclusions are made in Section 5. 2 Information Tables and a Decision Logic Language An information table provides a convenient way to describe a finite set of objects by a finite set of attributes. Definition 1. An information table S is the tuple: S = (U, At, {V a a At}, {I a a At}), where U is a finite nonempty set of objects, At is a finite nonempty set of attributes, V a is a nonempty set of values for attribute a At, and I a : U V a is an information function.

To describe the information in an information table, we adopt the decision logic language L that was discussed in [7]. Definition 2. A decision logic language L consists of a set of formulas, which are defined by the following two rules: (i) An atomic formula of L is a descriptor a = v, where a At and v V a ; (ii) The well-formed formulas (wffs) of L is the smallest set, containing the atomic formulas and closed under and. In an information table S, the satisfiability of a formula φ L by an object is written as x = S φ, or in short x = φ if S is understood. With the notion of satisfiability, one may obtain a set-theoretic interpretation of formulas of L. That is, if φ is a formula, the set m S (φ), defined by m S (φ) = {x U x = φ}, is called the meaning of the formula φ in S. If S is understood, we simply write m(φ). The meaning of a formula φ is the set of all objects having the properties expressed by the formula φ. If m S (φ), then φ is meaningful in S. With φ and m(φ), a connection between formulas of L and subsets of U is thus established. A subset X U is called a definable granule in an information table S if there exists at least one formula φ such that m S (φ) = X. The notion of definability of subsets in an information table is essential to data analysis. In fact, definable subsets are the basic units that can be described and discussed, upon which other notions can be developed. A formula φ i is a refinement of another formula φ j, or equivalently, φ j is a coarsening of φ i. The refinement relation can be denoted by logical implication, written as φ i φ j. In the context of an information table S, φ i S φ j, if and only if m(φ i ) m(φ j ). Given two formulas φ i and φ j, the meet φ i φ j defines the largest intersection of the granules m(φ i ) and m(φ j ), and the join φ i φ j defines the smallest union of the granules m(φ i ) and m(φ j ). 3 Formal Concept Analysis and Logical Concept Analysis Formal concept analysis (FCA) deals with the characterization of a concept consisting of its intension and extension [3, 11]. By considering the decision logic language, we can transform formal concepts to a logical setting, and perform logical concept analysis (LCA) [4]. LCA extends an intension from a set of properties to a logical formula defined by these properties. By extending FCA to LCA, we enhance the flexibility for description, management, updating, querying, or navigation in the concepts. 3.1 Formal concept analysis Denote F as the set of all atomic formulas in the decision logic language L, i.e., F = {a = v a At, v V a }. For O U and F F, define O = {f F x O : x = S f}, (1) F = {x U f F : x = S f}. (2)

So O is the set of atomic formulas common to all the objects in O, and F is the set of objects possessing all the atomic formulas in F. Lemma 1. [11] Let an information table S be a formal context, O i, O j U and F i, F j F. Then (1) O i O j O i O j, (1 ) F i F j F i F j ; (2) O O, (2 ) F F ; (3) O = O, (3 ) F = F ; (4) (O i O j ) = O i O j, (4 ) (F i F j ) = F i F j. Definition 3. [11] A formal concept of an information table S is defined as a pair (O, F ), where O U, F F, O = F and F = O. The extension of the concept (O, F ) is O, and the intension is F. 3.2 Logical concept analysis limited to conjunction FCA, discussed above, deals with both intensions and extensions in the settheoretic setting, and does not consider the relationships between the elements of intensions. By involving the decision logic language L, we move to a logical setting for LCA. Intuitively, the set-based intensions imply a conjunctive relation on the included atomic formulas. In this paper, we only focus our attention on logical conjunction. Thus, we can define two logically conjunctive dual functions as follows: O = O = {f F x O : x = S f}, (3) = O t, where O t = O; (4) φ = m S (φ) = {x U x = S φ}, (5) = φ t, where (φ t ) = (φ ). (6) Here, we use two different notations for O. Equation 3 intersects the common properties of all the objects in O by using the logic-based conjunctor; Equation 4 computes the least upper bound of all the conjunctively definable formulas of subsets of objects by using the context-based disjunctor. Note that the context-based conjunctive operator and disjunctive operator are different from the logic-based conjunctor and disjunctor. For two formulas φ, ψ L in the context of an information table, φ ψ returns the greatest lower bound of φ and ψ (more specific), and φ ψ returns the least upper bound of φ and ψ (more general), with respect to the given universe. Transposition from a set F F to a conjunctive formula needs to replace, and by, and, respectively. Thus, Lemma 1 can be transformed as:

Lemma 2. Let an information table S be a context, O i, O j U and φ i, φ j L. Then (1) O i O j Oi O j, (1 ) φ i φ j φ i φ j ; (2) O O, (2 ) φ φ; (3) O O, (3 ) φ = φ ; (4) (O i O j ) φ i φ j, (4 ) (φ i φ j ) = φ i φ j. Definition 4. A conjunctive concept of an information table S is defined as a pair (O, φ), where O U, φ is a conjunctive formula, O φ and φ = O. The extension of the conjunctive concept (O, φ) is O, and the intension is φ. All the conjunctive concepts form a complete concept lattice, which possesses the following two properties: (O t, φ t ) = ( O t, ( φ t) ), (O t, φ t ) = (( O t ), φ t). For concepts (O i, φ i ) and (O j, φ j ) in the concept lattice, we write (O i, φ i ) (O j, φ j ), and say (O i, φ i ) is a sub-concept of (O j, φ j ), or (O j, φ j ) is a superconcept of (O i, φ i ), if O i O j, or φ i φ j. 4 Classification Based on Conjunctive Concept Analysis Without loss of generality, we assume that there is a unique attribute class taking class labels as its values. The set of attributes in an information table is expressed as At = D {class}, where D is the set of attributes used to describe the objects, also called the set of descriptive attributes. An information table for classification is also called a decision table. 4.1 Classification rules Each classification rule, in the form of φ class = c i, or simply, φ c i, is derived from, and associated with a definable granule X, such that φ describes X, and c i labels X. Therefore, each classification rule φ c i can be expressed by a decision relation between a definable pair including a granule X and its formula φ, and a class label, i.e., (X, φ) c i. It is clear that all the objects that satisfy the formula φ are in the granule X. However, φ might not contain all the properties X processes. It only defines X, and distinguishes X from the other granules. In this case, a definable pair (X, φ) possesses only the one-way definability, and is not a concept, which is two-way definable. Two well-studied rule measures, confidence and generality, are defined as: Confidence : conf(φ c i ) = m(φ c i) m(φ) ; (7) Generality : generality(φ c i ) = m(φ) U. (8)

The higher the confidence value is, the more accurate the rule is. When the confidence of a rule is 100%, we say the rule is consistent, or certain. Otherwise, it is approximate, or probabilistic. The higher the generality value is, the more applicable the rule is. Suppose a set R of consistent classification rules are discovered from an information table. Partition the universe U into a training set U training and a testing set U testing, then the descriptive accuracy can be defined as: description accu(u training ) = φ R m(φ). (9) U training When the description accuracy reaches 1, we say that the rule set R covers the entire training set. We say an object x U testing is accurately classified, if there exists one learned rule φ c i in the set R, such that x = φ and I class (x) = c i. We simply denote x = R. The prediction accuracy is defined as: prediction accu(u testing ) = {x U testing x = R}. (10) U testing Classification rule mining does not find all possible rules that exist in the information table, but only a subset to form an accurate classifier [2]. Different classification algorithms discover different subsets based on different heuristics. 4.2 Consistent classification rules and consistent concepts Definition 5. Let an information table S be a context, a conjunctive concept (X, φ) is called a consistent concept of S if it implies a unique label c i V class, and conf(φ c i ) = 100%. Suppose (X, φ) is a conjunctively definable pair (CDP), i.e., φ is defined by a conjunction of a set of atomic formulas. We can obtain the following inductions: For a CDP (X, φ), if conf(φ c i ) = 100%, then the conjunctively consistent concept (X, X ) c i, and X φ, the conjunctively consistent concept (φ, φ ) c i, and φ φ, there might exist a subset Y X, such that the conjunctively consistent concept (Y, Y ) c i. Suppose (X, φ) is a conjunctive concept. If (X, φ) consistently implies class c i, then for any ψ φ, the CDP (ψ, ψ) has conf(ψ c i ) = 100%, the conjunctively consistent concept (ψ, ψ ) c i, there might exist a subset Y X, such that the conjunctively consistent concept (Y, Y ) c i. Definition 6. A most general consistent concept is a consistent concept in the information table, and its super-concepts are not consistent concepts.

If a super-concept of the concept (X, φ) is a most general consistent concept, it is denoted as (X, φ). Each consistent classification rule is associated with a granule, and is defined by some conditional attributes in the information table. If the conditions can be conjuncted, the formula and the granule form a CDP. Further, each CDP is associated with a conjunctively consistent concept. If the concept is not a general consistent concept, then there must exist a super concept corresponding to it. We can use the following flow to illustrate the underlining logic: 1. Given conf(φ c i ) = 100%, where c i V class ; 2. if φ is a conjunctor, then the CDP (φ, φ) c i ; 3. then the conjunctively consistent concept (φ, φ ) c i ; 4. then the most general consistent concept (φ, φ ) c i. As a result, instead of finding all the consistent classification rules, we can find the set of all the most general consistent concepts that characterizes the complete consistent rule set. All the most general consistent concepts in an information table compose a covering of the universe, i.e., there may be an overlap between every two most general consistent concepts, and all the most general consistent concepts cover the entire universe. This can be easily proved by making the given decision table satisfy the first normal form, that requires all attribute-values in the table are atomic. In this case, for any object x U, the conjunctive formula (a = Ia (x)), for all a At forms an intension φ, and the pair (φ, φ) forms a concept. Clearly, the family of all concepts as such (φ, φ) cover the universe. Due to the fact that a most general consistent concept can cover one or more than one concepts, all the most general consistent concepts is a covering of the universe. A covering-based algorithm tends to generate a set of rules that cover the objects of the given information table. For some covering-based algorithms, a granule that is covered by a rule is biased to be as big as possible. Suppose R is a set of conjunctively consistent rules that are generated by a covering-based classification algorithm, and B is the set of rules defined by the most general consistent concepts. Then (1) R B, (2) For φ R, b B, and lhs(φ) = lhs(b), lhs(b) lhs(φ) and lhs(φ) lhs(b), where lhs stands for the left hand side of a rule. A partition-based algorithm is normally biased to generate a shorter tree. Each CDP that maps to a rule is often not a most general consistent concept. Suppose R is the set of consistent rules that are generated by any partition-based algorithm, then the second property still holds, and R B. Limited by the bias of partition-based algorithms, finding the most general consistent concept from a corresponding CDP is not easy.

4.3 A simple example The decision Table 1 has four descriptive attributes A, B, C and D, and a decision attribute class. The entire search space, the concept lattice includes 45 conjunctive concepts. For the purpose of classification, we are only interested in the consistent concepts defined by subsets of descriptive attributes, such that concept (X, φ) c i. Based on these consistent concepts, we further need to find out the most general consistent concepts, such that for each concept, there does not exist a more general concept implies the same class. As we listed, there are six the most general consistent concepts. The conjunctive concepts, the consistent concepts and the most general consistent concepts are summarized in Table 2. The algorithm [8] produces a set of six consistent rules that partition the universe. The algorithm [1], (shown in Figure 1), generates another set of consistent rules that cover the universe. Each rule can map to one of the most general consistent concepts. For example, The left hand side of an rule a 3 b 1 c 2 can be described by a CDP ({10}, a 3 b 1 c 2 ), which maps to a concept ({10}, a 3 b 1 c 2 d 1 ), which is more specific than a most general consistent concept ({2, 6, 10}, b 1 c 2 ). The left hand side of a rule a 1 b 2 + can be described by a CDP ({3, 4}, a 1 b 2 ), which maps to a concept ({3, 4}, a 1 b 2 d 1 ), which is more specific than a most general consistent concept ({3, 4, 8, 11, 12}, b 2 d 1 ). The comparisons among the CDPs, the CDPs and the most general consistent concepts are illustrated in Table 3. 4.4 An algorithm for finding the most general consistent concepts There are two approaches for finding the most general consistent concepts. One approach is to find them in the concept lattice by brute force. First, construct the concept lattice of the given information table. Then, find the consistent concepts. Thirdly, eliminate all the non-most general concepts. This approach encounters a complexity problem for the first step. If we search the universe U for definable granules, then the search space is 2 U. If we search the set F of all atomic formulas for a subset of atomic formulas, then the search space of is a D ( V a + 1). In most cases, ( V a + 1) 2 U. This means we need to test at least ( V a + 1) conjunctively definable formulas, and their conjunctive definable granules, in order to verify if each CDP is a conjunctive concept or not. The other approach is to generate the most general consistent concept by heuristic. First, apply a heuristic covering algorithm to produce a set of consistent classification rules. Then, find the corresponding concepts to each rule. Finally, eliminate those non-most general concepts. If the set of classification rules cover the universe, the corresponding set of conjunctive concepts and the most general consistent concepts also does. Since the number of classification rules is limited, this approach must be much more efficient than the first one. The algorithm [1], illustrated in Figure 1, is a good candidate for generating the most general consistent concepts.

Table 1. A decision table A B C D class o 1 a 1 b 1 c 1 d 2 - o 2 a 1 b 1 c 2 d 2 - o 3 a 1 b 2 c 1 d 1 + o 4 a 1 b 2 c 2 d 1 + o 5 a 2 b 1 c 1 d 2 - o 6 a 2 b 1 c 2 d 1 - o 7 a 2 b 2 c 1 d 2 - o 8 a 2 b 2 c 2 d 1 + o 9 a 3 b 1 c 1 d 2 + o 10 a 3 b 1 c 2 d 1 - o 11 a 3 b 2 c 1 d 1 + o 12 a 3 b 2 c 2 d 1 + Table 2. The conjunctive concepts, the consistent concepts and the most general consistent concepts of Table 1 Conjunctive concepts Consistent concepts c i V Class Extension Intension Extension Intension {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12} {1, 2} a 1 b 1 d 2 {1, 2, 3, 4} a 1 {5, 6} a 2 b 1 {5, 6, 7, 8} a 2 {3, 4} a 1 b 2 d 1 + {9, 10, 11, 12} a 3 {11, 12} a 3 b 2 d 1 + {1, 2, 5, 6, 9, 10} b 1 {5, 7} a 2 c 1 d 2 {1, 2} a 1 b 1 d 2 {9, 11} a 3 c 1 + {5, 6} a 2 b 1 {1} a 1 b 1 c 1 d 2 {9, 10} a 3 b 1 {5} a 2 b 1 c 1 d 2 {3, 4, 7, 8, 11, 12} b 2 {9} a 3 b 1 c 1 d 2 + {3, 4} a 1 b 2 d 1 {3} a 1 b 2 c 1 d 1 + {7, 8} a 2 b 2 {7} a 2 b 2 c 1 d 2 {11, 12} a 3 b 2 d 1 {11} a 3 b 2 c 1 d 1 + {1, 3, 5, 7, 9, 11} c 1 {2, 6, 10} b 1 c 2 {1, 3} a 1 c 1 {2} a 1 b 1 c 2 d 2 {5, 7} a 2 c 1 d 2 {6} a 2 b 1 c 2 d 1 {9, 11} a 3 c 1 {10} a 3 b 1 c 2 d 1 {1, 5, 9} b 1 c 1 d 2 {4, 8, 12} b 2 c 2 d 1 + {1} a 1 b 1 c 1 d 2 {4} a 1 b 2 c 2 d 1 + {5} a 2 b 1 c 1 d 2 {8} a 2 b 2 c 2 d 1 + {9} a 3 b 1 c 1 d 2 {12} a 3 b 2 c 2 d 1 + {3, 7, 11} b 2 c 1 {6, 10} b 1 c 2 d 1 {3} a 1 b 2 c 1 d 1 {3, 4, 8, 11, 12} b 2 d 1 + {7} a 2 b 2 c 1 d 2 {3, 11} b 2 c 1 d 1 + {11} a 3 b 2 c 1 d 1 {2, 4, 6, 8, 10, 12} c 2 {2, 4} a 1 c 2 {6, 8} a 2 c 2 d 1 {10, 12} a 3 c 2 d 1 {2, 6, 10} b 1 c 2 {2} a 1 b 1 c 2 d 2 Most general consistent concepts c i V Class {6} a 2 b 1 c 2 d 1 Extension Intention {10} a 3 b 1 c 2 d 1 {1, 2} a 1 b 1 d 2 - {4, 8, 12} b 2 c 2 d 1 {5, 6} a 2 b 1 - {4} a 1 b 2 c 2 d 1 {5, 7} a 2 c 1 d 2 - {8} a 2 b 2 c 2 d 1 {2, 6, 10} b 1 c 2 - {12} a 3 b 2 c 2 d 1 {9, 11} a 3 c 1 + {3, 4, 6, 8, 10, 11, 12} d 1 {3, 4, 8, 11, 12} b 2 d 1 + {10, 11, 12} a 3 d 1 {6, 10} b 1 c 2 d 1 {3, 4, 8, 11, 12} b 2 d 1 {3, 11} b 2 c 1 d 1 {4, 6, 8, 10, 12} c 2 d 1 {1, 2, 5, 7, 9} d 2 {1, 2, 5, 9} b 1 d 2 {1, 5, 7, 9} c 1 d 2 Table 3. Compare the, CDPs with the most general consistent concepts of Table 1 CDPs CDPs Most general consistent concepts c i V class ({1, 2}, a 1 b 1 ) ({1, 2}, a 1 b 1 ) ({1, 2}, a 1 b 1 d 2 ) - ({5, 6}, a 2 b 1 ) ({5, 6}, a 2 b 1 ) ({5, 6}, a 2 b 1 ) ({10}, a 3 b 1 c 2 ) ({6, 10}, b 1 d 1 ) ({2, 6, 10}, b 1 c 2 ) ({2, 6, 10}, b 1 c 2 ) ({7}, b 2 d 2 ) ({5, 7}, a 2 c 1 ) ({5, 7}, a 2 c 1 d 2 ) ({5, 7}, a 2 d 2 ) ({9}, a 3 b 1 c 1 ) ({9}, a 3 d 2 ) ({9, 11}, a 3 c 1 ) + ({9, 11}, a 3 c 1 ) ({3, 4, 8, 11, 12}, b 2 d 1 ) ({3, 4}, a 1 b 2 ) ({3, 4, 8, 11, 12}, b 2 d 1 ) ({11, 12}, a 3 b 2 ) ({4, 8, 12}, b 2 c 2 ) ({3, 4, 8, 11, 12}, b 2 d 1 )

Input: a decision table Output: a set of consistent classification rules For each c i V class, do the following: 1. Calculate the probability of p(c i φ) of the class c i given each atomic formula φ F. 2. Select the first φ t for which p(c i φ t) is the maximum. Create a subset of the training set comprising all the instances which contain the selected φ t. 3. Repeat Steps 1 and 2 until the local p(c i φ t) reaches 1 or stop if no more subsets can be extracted. At this time, check if there is any other condition φ s such that the local p(c i φ s ) also reaches 1. 4. Remove all the objects covered by the rule(s) from the table. 5. Repeat Step 1-4 until all the objects of class c i have been removed. Fig. 1. The algorithm Input: a decision table Output: a set of the most general consistent concepts 1. Apply the algorithm to generate a set of consistent classification rules: {φ c i c i V class }. 2. Construct a CDP for each consistent rule: {(φ, φ) c i}. 3. Construct a conjunctively consistent concept for each CDP: {(φ, φ ) c i }. 4. For each conjunctively consistent concept (φ, φ ), if there exists another conjunctively consistent concept (φ t, φ t ) such that φ φ t, then (φ, φ ) is not a most general consistent concepts, and is eliminated. Fig. 2. -concept: An algorithm for finding the most general consistent concepts Figure 2 describes the procedure of finding a set of the most general consistent concepts based on the algorithm. This algorithm is thus called the -concept algorithm. -concept has a higher computational complexity than because of the concept construction process. It prunes the rule set to the most kernel by considering the subset relation of concept extensions. The set of the most general consistent concepts cannot be more simplified, as a result of its extension cannot be bigger, and its intension cannot be coarser. 4.5 Experiments In order to evaluate the proposed -concept algorithm, we choose three sample datasets from UCI machine learning repository [10], and use SGI s MLC++ utilities 2.0 to generate categorical data [9]. We use 5-cross validation to divide training sets and testing sets, upon which the partition-based, the coveringbased, and the LCA-based -concept are tested for comparison. We keep track the number of rules, accuracy of both description and prediction for three datasets. The experimental results are reported in Figure 3. The number of -concept rules is between which of and, due to the difference between the partition-based and the covering-based algo-

rithms, and the nature of generalization capability of the concept-based rules. Since the -concepts are generated from a set of rules, and -concept have the same description accuracy, and it is normally higher than what can reach. This indicates that for the purpose of keeping a higher description accuracy, we can greatly simplify the set of covering-based consistent rules by using a set of the most general consistent concept-rules. The prediction accuracy of -concept rules is also between which of and. That is because has the greatest number of rules. That makes it more flexible to do testing, especially when error or missing records happen in testing datasets. As a fact that the intension of a most general consistent concept is the conjunction of all properties possessed by the extension, thus, it precisely describes the given training set, even might overfit the training set. Overfitting rules are good for description, but not good for testing. Cleve 1 st 2 nd 3 rd 4 th 5 th 303 records (training/testing: 243/60) 13 conditional attributes 2 decision classes # of rules 69 192 117 78 171 104 71 184 96 68 172 93 73 169 111 Acc. of description. (%) 98.35 98.35 97.94 98.35 98.77 99.18 98.35 98.77 97.94 98.35 Acc. of prediction (%) 80.00 91.67 81.67 81.67 91.67 81.67 74.69 90.00 81.67 68.33 88.33 78.03 73.33 95.00 86.67 Vote 1 st 2 nd 3 rd 4 th 5 th 435 records (training/testing: 348/87) 16 conditional attributes 2 decision class # of rules 31 167 107 28 152 89 28 167 110 28 141 82 34 154 97 Acc. of description (%) 100 100 100 100 100 100 100 100 100 100 Acc. of prediction (%) 94.25 97.70 97.70 88.51 96.55 95.40 90.80 96.55 96.55 89.66 97.70 97.70 91.95 100 98.85 Iris 1 st 2 nd 3 rd 4 th 5 th 150 records (training/testing: 120/30) 4 conditional attributes 3 decision class # of rules 5 14 9 6 13 9 6 13 9 6 12 9 6 13 9 Acc. of description (%) 96.67 96.67 97.50 97.50 97.50 97.50 98.33 98.33 97.50 97.50 Acc. of prediction (%) 76.67 90.00 90.00 73.33 86.67 86.67 73.33 86.67 86.67 70.00 83.33 83.33 73.33 86.67 86.67 concept concept concept concept concept concept concept concept concept concept concept concept concept concept concept Fig. 3. Compare, and -concept on three sample datasets Although we use the consistent classification tasks (i.e., φ c i ) as an example going through the paper, it does not mean that conjunctive concepts cannot cope with approximation classification tasks in general (i.e., conf(φ c i ) 1). Suppose conf(φ c i ) = α < 1, where c i V class is the class label satisfying the majority of the object set φ. If φ is a conjunctor, then the CDP (φ, φ) has conf((φ, φ) c i ) = α; and the conjunctive concept (φ, φ ) has conf((φ, φ ) c i ) = α, which is not a consistent concept. However, a super concept of (φ, φ ), denoted as (φ, φ ), might or might not indicate the same class label c i. If conf( (φ, φ ) c i ) is satisfiable to the user, then the sub-concept (φ, φ ) can be pruned. 5 Conclusion Logical concept analysis provides an alternative way to study classification tasks. For consistent classification problems, a consistent classification rule corresponds

to a conjunctively definable pair, each conjunctively definable pair corresponds to a conjunctively consistent concept, and each conjunctively consistent concept corresponds to a most general consistent concept. All the most general consistent concepts form a special set of consistent classification rules, which can describe the given universe precisely and concisely by its nature. There are two approaches to find the set of the most general consistent concepts. One is from the concept lattice. The other is from a set of heuristic covering-based rules. The study shows that these two approaches can find a unique and complete set of the most general consistent concepts. Logical concept analysis can also be applied for probabilistic classification problems. Though, to generalize a (inconsistent) concept to its super concept, in order to simplify the concept rule set, one needs to use more heuristics and thresholds. That is a research topic we set up for the next stage. References 1. Cendrowska, J., : an algorithm for inducing modular rules, International Journal of Man-Machine Studies, 27, 349-370, 1987. 2. Clark, P. and Matwin, S., Using qualitative models to guide induction learning, Proceedings of International Conference on Machine Learning, 49-56, 1993. 3. Demri, S. and Orlowska, E., Logical analysis of indiscernibility, in: Incomplete Information: Rough Set Analysis, Orlowska, E. (Ed.), Physica-Verlag, Heidelberg, 347-380, 1998. 4. Ferré, S. and Ridoux, O., A logical generalization of formal concept analysis, Proceedings of International Conference on Conceptual Structures, 371-384, 2000. 5. Khan, M., Ding, Q., and Perrizo, W., K-nearest neighbor classification on spatial data streams using P-trees, Proceedings of PAKDD 02, 517-528, 2002. 6. Mitchell, T., Version Spaces: An Approach to Concept Learning, PhD thesis, Computer Science Department, Stanford University, Stanford, California, 1978. 7. Pawlak, Z., Rough Sets: Theoretical Aspects of Reasoning about Data, Kluwer Academic Publishers, Dordrecht, 1991. 8. Quinlan, J.R., Learning efficient classification procedures and their application to chess end-games, in: Machine Learning: An Artificial Intelligence Approach, 1, 463-482, 1983. 9. SGI s MLC++ utilities 2.0: the discretize utility. http://www.sgi.com/tech/mlc 10. UCI Machine Learning Repository. http://www1.ics.uci.edu/ mlearn/mlrepository.html 11. Wille, R., Concept lattices and conceptual knowledge systems, Computers Mathematics with Applications, 23, 493-515, 1992. 12. Yao, Y.Y., On Modeling data mining with granular computing, Proceedings of COMPSAC 01, 638-643, 2001. 13. Yao, Y.Y. and Yao, J.T., Granular computing as a basis for consistent classification problems, Special Issue of PAKDD 02 Workshop on Toward the Foundation of Data Mining), 5(2), 101-106, 2002. 14. Zhao, Y. and Yao, Y.Y., Interactive user-driven classification using a granule network, Proceedings of the Fifth International Conference of Cognitive Informatics (ICCI05), 250-259, 2005.