Classification Based on Logical Concept Analysis

Size: px
Start display at page:

Download "Classification Based on Logical Concept Analysis"

Transcription

1 Classification Based on Logical Concept Analysis Yan Zhao and Yiyu Yao Department of Computer Science, University of Regina, Regina, Saskatchewan, Canada S4S 0A2 {yanzhao, Abstract. This paper studies the problem of classification by using a concept lattice as a search space of classification rules. The left hand side of a classification rule is composed by a concept, including its extension and its intension, and the right hand side is the class label that the concept implies. Particularly, we show that logical concepts of the given universe are naturally associated with any consistent classification rules generated by any partition-based or covering-based algorithm, and can be characterized as a special set of consistent classification rules. An algorithm is proposed to find a set of the most general consistent concepts. 1 Introduction The objectives of classification tasks can be divided into description and prediction. Description focuses on the discovery of rules that describe data, and prediction involves the use of discovered rules to make prediction. A classification rule is normally expressed in the form of if φ then ψ, or symbolically, φ ψ. The left hand side is a formula that characterizes a subset of the objects, and the right hand side is a label that indicates the class of this set of objects. Generally, a classification task can be understood as a search in a particular search space of possible solutions. The features of a search space determine the properties of the rules to be constructed; the structure and the complexity of the search space are primary measures of the difficulty of a classification problem. Given a particular search space, different search strategies, such as depth-first, breath-first and best-first methods, together with some heuristics can be used to explore the normally very large space [6, 14]. Many search spaces for classification tasks have been intensively studied. For example, a version space [6] has the most specific bound and the most general bound, such that the most specific bound contains the set of maximally specific formulas with respect to some training data, and the most general bound contains the set of maximally general formulas with respect to some other training data. It allows the general-to-specific and the specific-to-general breadth-first search at the same time. The left hand side of classification rules are all possible generalizations that could be created from these two bounding sets. As another example, a granule network [14] systematically organizes all the granules and formulas with respect to the given universe. Each node consists of

2 a granule which is a subset of objects in the universe, and each arc leading from a granule to its child is labelled by an atomic formula. A path from a coarse granule to a fine granule indicates a conjunctive relation. The left hand side of a classification rule is a disjunction of a conjunctive set of atomic formulas. A clustering-based classifier presents another search space. For example, for a k-nn classifier [5], based on some pre-selected distance metric, k clusters are constructed. Each is assigned a particular class. The left hand side of a classification rule is a disjunction of a set of clusters. The problem of this search space is that using a relatively large k may include some not so similar pixels. On the other hand, using a very small k may exclude some potential accurate rules. The optimal value of k depends on the size and the nature of the data. This paper intends to introduce another search space, a concept lattice, for classification tasks. As a result, the left hand side of a classification rule is a concept, including a set of objects (an extension) and a set of properties (an intension). There are several advantages of using concept analysis for classification. Concepts are extremely precise in the sense that an intention and an extension are two-way definable. This ensures that the constructed concept-based rules are most descriptive and accurate. All the concepts are naturally organized into a concept hierarchy. Once concepts are constructed and described, one can study relationships between concepts in terms of their intensions and extensions, such as sub-concepts and super-concepts, disjoint and overlap concepts, and partial sub-concepts. These relationships can be conveniently expressed in the form of rules and associated with quantitative measures indicating the strength of rules. Knowledge discovery and data mining, especially rule mining, can be viewed as a process of forming concepts and finding relationships between concepts in terms of intensions and extensions [12, 13]. The rest of the paper is organized as follows. Section 2 formalizes the basic settings of information tables and a decision logic language. After that, the notion of formal concepts and one of its logical transformations are discussed in Section 3. Section 4 studies the relationship between consistent classification rules and consistent concepts, and proposes a heuristic method to explore the most general consistent concepts. Conclusions are made in Section 5. 2 Information Tables and a Decision Logic Language An information table provides a convenient way to describe a finite set of objects by a finite set of attributes. Definition 1. An information table S is the tuple: S = (U, At, {V a a At}, {I a a At}), where U is a finite nonempty set of objects, At is a finite nonempty set of attributes, V a is a nonempty set of values for attribute a At, and I a : U V a is an information function.

3 To describe the information in an information table, we adopt the decision logic language L that was discussed in [7]. Definition 2. A decision logic language L consists of a set of formulas, which are defined by the following two rules: (i) An atomic formula of L is a descriptor a = v, where a At and v V a ; (ii) The well-formed formulas (wffs) of L is the smallest set, containing the atomic formulas and closed under and. In an information table S, the satisfiability of a formula φ L by an object is written as x = S φ, or in short x = φ if S is understood. With the notion of satisfiability, one may obtain a set-theoretic interpretation of formulas of L. That is, if φ is a formula, the set m S (φ), defined by m S (φ) = {x U x = φ}, is called the meaning of the formula φ in S. If S is understood, we simply write m(φ). The meaning of a formula φ is the set of all objects having the properties expressed by the formula φ. If m S (φ), then φ is meaningful in S. With φ and m(φ), a connection between formulas of L and subsets of U is thus established. A subset X U is called a definable granule in an information table S if there exists at least one formula φ such that m S (φ) = X. The notion of definability of subsets in an information table is essential to data analysis. In fact, definable subsets are the basic units that can be described and discussed, upon which other notions can be developed. A formula φ i is a refinement of another formula φ j, or equivalently, φ j is a coarsening of φ i. The refinement relation can be denoted by logical implication, written as φ i φ j. In the context of an information table S, φ i S φ j, if and only if m(φ i ) m(φ j ). Given two formulas φ i and φ j, the meet φ i φ j defines the largest intersection of the granules m(φ i ) and m(φ j ), and the join φ i φ j defines the smallest union of the granules m(φ i ) and m(φ j ). 3 Formal Concept Analysis and Logical Concept Analysis Formal concept analysis (FCA) deals with the characterization of a concept consisting of its intension and extension [3, 11]. By considering the decision logic language, we can transform formal concepts to a logical setting, and perform logical concept analysis (LCA) [4]. LCA extends an intension from a set of properties to a logical formula defined by these properties. By extending FCA to LCA, we enhance the flexibility for description, management, updating, querying, or navigation in the concepts. 3.1 Formal concept analysis Denote F as the set of all atomic formulas in the decision logic language L, i.e., F = {a = v a At, v V a }. For O U and F F, define O = {f F x O : x = S f}, (1) F = {x U f F : x = S f}. (2)

4 So O is the set of atomic formulas common to all the objects in O, and F is the set of objects possessing all the atomic formulas in F. Lemma 1. [11] Let an information table S be a formal context, O i, O j U and F i, F j F. Then (1) O i O j O i O j, (1 ) F i F j F i F j ; (2) O O, (2 ) F F ; (3) O = O, (3 ) F = F ; (4) (O i O j ) = O i O j, (4 ) (F i F j ) = F i F j. Definition 3. [11] A formal concept of an information table S is defined as a pair (O, F ), where O U, F F, O = F and F = O. The extension of the concept (O, F ) is O, and the intension is F. 3.2 Logical concept analysis limited to conjunction FCA, discussed above, deals with both intensions and extensions in the settheoretic setting, and does not consider the relationships between the elements of intensions. By involving the decision logic language L, we move to a logical setting for LCA. Intuitively, the set-based intensions imply a conjunctive relation on the included atomic formulas. In this paper, we only focus our attention on logical conjunction. Thus, we can define two logically conjunctive dual functions as follows: O = O = {f F x O : x = S f}, (3) = O t, where O t = O; (4) φ = m S (φ) = {x U x = S φ}, (5) = φ t, where (φ t ) = (φ ). (6) Here, we use two different notations for O. Equation 3 intersects the common properties of all the objects in O by using the logic-based conjunctor; Equation 4 computes the least upper bound of all the conjunctively definable formulas of subsets of objects by using the context-based disjunctor. Note that the context-based conjunctive operator and disjunctive operator are different from the logic-based conjunctor and disjunctor. For two formulas φ, ψ L in the context of an information table, φ ψ returns the greatest lower bound of φ and ψ (more specific), and φ ψ returns the least upper bound of φ and ψ (more general), with respect to the given universe. Transposition from a set F F to a conjunctive formula needs to replace, and by, and, respectively. Thus, Lemma 1 can be transformed as:

5 Lemma 2. Let an information table S be a context, O i, O j U and φ i, φ j L. Then (1) O i O j Oi O j, (1 ) φ i φ j φ i φ j ; (2) O O, (2 ) φ φ; (3) O O, (3 ) φ = φ ; (4) (O i O j ) φ i φ j, (4 ) (φ i φ j ) = φ i φ j. Definition 4. A conjunctive concept of an information table S is defined as a pair (O, φ), where O U, φ is a conjunctive formula, O φ and φ = O. The extension of the conjunctive concept (O, φ) is O, and the intension is φ. All the conjunctive concepts form a complete concept lattice, which possesses the following two properties: (O t, φ t ) = ( O t, ( φ t) ), (O t, φ t ) = (( O t ), φ t). For concepts (O i, φ i ) and (O j, φ j ) in the concept lattice, we write (O i, φ i ) (O j, φ j ), and say (O i, φ i ) is a sub-concept of (O j, φ j ), or (O j, φ j ) is a superconcept of (O i, φ i ), if O i O j, or φ i φ j. 4 Classification Based on Conjunctive Concept Analysis Without loss of generality, we assume that there is a unique attribute class taking class labels as its values. The set of attributes in an information table is expressed as At = D {class}, where D is the set of attributes used to describe the objects, also called the set of descriptive attributes. An information table for classification is also called a decision table. 4.1 Classification rules Each classification rule, in the form of φ class = c i, or simply, φ c i, is derived from, and associated with a definable granule X, such that φ describes X, and c i labels X. Therefore, each classification rule φ c i can be expressed by a decision relation between a definable pair including a granule X and its formula φ, and a class label, i.e., (X, φ) c i. It is clear that all the objects that satisfy the formula φ are in the granule X. However, φ might not contain all the properties X processes. It only defines X, and distinguishes X from the other granules. In this case, a definable pair (X, φ) possesses only the one-way definability, and is not a concept, which is two-way definable. Two well-studied rule measures, confidence and generality, are defined as: Confidence : conf(φ c i ) = m(φ c i) m(φ) ; (7) Generality : generality(φ c i ) = m(φ) U. (8)

6 The higher the confidence value is, the more accurate the rule is. When the confidence of a rule is 100%, we say the rule is consistent, or certain. Otherwise, it is approximate, or probabilistic. The higher the generality value is, the more applicable the rule is. Suppose a set R of consistent classification rules are discovered from an information table. Partition the universe U into a training set U training and a testing set U testing, then the descriptive accuracy can be defined as: description accu(u training ) = φ R m(φ). (9) U training When the description accuracy reaches 1, we say that the rule set R covers the entire training set. We say an object x U testing is accurately classified, if there exists one learned rule φ c i in the set R, such that x = φ and I class (x) = c i. We simply denote x = R. The prediction accuracy is defined as: prediction accu(u testing ) = {x U testing x = R}. (10) U testing Classification rule mining does not find all possible rules that exist in the information table, but only a subset to form an accurate classifier [2]. Different classification algorithms discover different subsets based on different heuristics. 4.2 Consistent classification rules and consistent concepts Definition 5. Let an information table S be a context, a conjunctive concept (X, φ) is called a consistent concept of S if it implies a unique label c i V class, and conf(φ c i ) = 100%. Suppose (X, φ) is a conjunctively definable pair (CDP), i.e., φ is defined by a conjunction of a set of atomic formulas. We can obtain the following inductions: For a CDP (X, φ), if conf(φ c i ) = 100%, then the conjunctively consistent concept (X, X ) c i, and X φ, the conjunctively consistent concept (φ, φ ) c i, and φ φ, there might exist a subset Y X, such that the conjunctively consistent concept (Y, Y ) c i. Suppose (X, φ) is a conjunctive concept. If (X, φ) consistently implies class c i, then for any ψ φ, the CDP (ψ, ψ) has conf(ψ c i ) = 100%, the conjunctively consistent concept (ψ, ψ ) c i, there might exist a subset Y X, such that the conjunctively consistent concept (Y, Y ) c i. Definition 6. A most general consistent concept is a consistent concept in the information table, and its super-concepts are not consistent concepts.

7 If a super-concept of the concept (X, φ) is a most general consistent concept, it is denoted as (X, φ). Each consistent classification rule is associated with a granule, and is defined by some conditional attributes in the information table. If the conditions can be conjuncted, the formula and the granule form a CDP. Further, each CDP is associated with a conjunctively consistent concept. If the concept is not a general consistent concept, then there must exist a super concept corresponding to it. We can use the following flow to illustrate the underlining logic: 1. Given conf(φ c i ) = 100%, where c i V class ; 2. if φ is a conjunctor, then the CDP (φ, φ) c i ; 3. then the conjunctively consistent concept (φ, φ ) c i ; 4. then the most general consistent concept (φ, φ ) c i. As a result, instead of finding all the consistent classification rules, we can find the set of all the most general consistent concepts that characterizes the complete consistent rule set. All the most general consistent concepts in an information table compose a covering of the universe, i.e., there may be an overlap between every two most general consistent concepts, and all the most general consistent concepts cover the entire universe. This can be easily proved by making the given decision table satisfy the first normal form, that requires all attribute-values in the table are atomic. In this case, for any object x U, the conjunctive formula (a = Ia (x)), for all a At forms an intension φ, and the pair (φ, φ) forms a concept. Clearly, the family of all concepts as such (φ, φ) cover the universe. Due to the fact that a most general consistent concept can cover one or more than one concepts, all the most general consistent concepts is a covering of the universe. A covering-based algorithm tends to generate a set of rules that cover the objects of the given information table. For some covering-based algorithms, a granule that is covered by a rule is biased to be as big as possible. Suppose R is a set of conjunctively consistent rules that are generated by a covering-based classification algorithm, and B is the set of rules defined by the most general consistent concepts. Then (1) R B, (2) For φ R, b B, and lhs(φ) = lhs(b), lhs(b) lhs(φ) and lhs(φ) lhs(b), where lhs stands for the left hand side of a rule. A partition-based algorithm is normally biased to generate a shorter tree. Each CDP that maps to a rule is often not a most general consistent concept. Suppose R is the set of consistent rules that are generated by any partition-based algorithm, then the second property still holds, and R B. Limited by the bias of partition-based algorithms, finding the most general consistent concept from a corresponding CDP is not easy.

8 4.3 A simple example The decision Table 1 has four descriptive attributes A, B, C and D, and a decision attribute class. The entire search space, the concept lattice includes 45 conjunctive concepts. For the purpose of classification, we are only interested in the consistent concepts defined by subsets of descriptive attributes, such that concept (X, φ) c i. Based on these consistent concepts, we further need to find out the most general consistent concepts, such that for each concept, there does not exist a more general concept implies the same class. As we listed, there are six the most general consistent concepts. The conjunctive concepts, the consistent concepts and the most general consistent concepts are summarized in Table 2. The algorithm [8] produces a set of six consistent rules that partition the universe. The algorithm [1], (shown in Figure 1), generates another set of consistent rules that cover the universe. Each rule can map to one of the most general consistent concepts. For example, The left hand side of an rule a 3 b 1 c 2 can be described by a CDP ({10}, a 3 b 1 c 2 ), which maps to a concept ({10}, a 3 b 1 c 2 d 1 ), which is more specific than a most general consistent concept ({2, 6, 10}, b 1 c 2 ). The left hand side of a rule a 1 b 2 + can be described by a CDP ({3, 4}, a 1 b 2 ), which maps to a concept ({3, 4}, a 1 b 2 d 1 ), which is more specific than a most general consistent concept ({3, 4, 8, 11, 12}, b 2 d 1 ). The comparisons among the CDPs, the CDPs and the most general consistent concepts are illustrated in Table An algorithm for finding the most general consistent concepts There are two approaches for finding the most general consistent concepts. One approach is to find them in the concept lattice by brute force. First, construct the concept lattice of the given information table. Then, find the consistent concepts. Thirdly, eliminate all the non-most general concepts. This approach encounters a complexity problem for the first step. If we search the universe U for definable granules, then the search space is 2 U. If we search the set F of all atomic formulas for a subset of atomic formulas, then the search space of is a D ( V a + 1). In most cases, ( V a + 1) 2 U. This means we need to test at least ( V a + 1) conjunctively definable formulas, and their conjunctive definable granules, in order to verify if each CDP is a conjunctive concept or not. The other approach is to generate the most general consistent concept by heuristic. First, apply a heuristic covering algorithm to produce a set of consistent classification rules. Then, find the corresponding concepts to each rule. Finally, eliminate those non-most general concepts. If the set of classification rules cover the universe, the corresponding set of conjunctive concepts and the most general consistent concepts also does. Since the number of classification rules is limited, this approach must be much more efficient than the first one. The algorithm [1], illustrated in Figure 1, is a good candidate for generating the most general consistent concepts.

9 Table 1. A decision table A B C D class o 1 a 1 b 1 c 1 d 2 - o 2 a 1 b 1 c 2 d 2 - o 3 a 1 b 2 c 1 d 1 + o 4 a 1 b 2 c 2 d 1 + o 5 a 2 b 1 c 1 d 2 - o 6 a 2 b 1 c 2 d 1 - o 7 a 2 b 2 c 1 d 2 - o 8 a 2 b 2 c 2 d 1 + o 9 a 3 b 1 c 1 d 2 + o 10 a 3 b 1 c 2 d 1 - o 11 a 3 b 2 c 1 d 1 + o 12 a 3 b 2 c 2 d 1 + Table 2. The conjunctive concepts, the consistent concepts and the most general consistent concepts of Table 1 Conjunctive concepts Consistent concepts c i V Class Extension Intension Extension Intension {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12} {1, 2} a 1 b 1 d 2 {1, 2, 3, 4} a 1 {5, 6} a 2 b 1 {5, 6, 7, 8} a 2 {3, 4} a 1 b 2 d 1 + {9, 10, 11, 12} a 3 {11, 12} a 3 b 2 d 1 + {1, 2, 5, 6, 9, 10} b 1 {5, 7} a 2 c 1 d 2 {1, 2} a 1 b 1 d 2 {9, 11} a 3 c 1 + {5, 6} a 2 b 1 {1} a 1 b 1 c 1 d 2 {9, 10} a 3 b 1 {5} a 2 b 1 c 1 d 2 {3, 4, 7, 8, 11, 12} b 2 {9} a 3 b 1 c 1 d 2 + {3, 4} a 1 b 2 d 1 {3} a 1 b 2 c 1 d 1 + {7, 8} a 2 b 2 {7} a 2 b 2 c 1 d 2 {11, 12} a 3 b 2 d 1 {11} a 3 b 2 c 1 d 1 + {1, 3, 5, 7, 9, 11} c 1 {2, 6, 10} b 1 c 2 {1, 3} a 1 c 1 {2} a 1 b 1 c 2 d 2 {5, 7} a 2 c 1 d 2 {6} a 2 b 1 c 2 d 1 {9, 11} a 3 c 1 {10} a 3 b 1 c 2 d 1 {1, 5, 9} b 1 c 1 d 2 {4, 8, 12} b 2 c 2 d 1 + {1} a 1 b 1 c 1 d 2 {4} a 1 b 2 c 2 d 1 + {5} a 2 b 1 c 1 d 2 {8} a 2 b 2 c 2 d 1 + {9} a 3 b 1 c 1 d 2 {12} a 3 b 2 c 2 d 1 + {3, 7, 11} b 2 c 1 {6, 10} b 1 c 2 d 1 {3} a 1 b 2 c 1 d 1 {3, 4, 8, 11, 12} b 2 d 1 + {7} a 2 b 2 c 1 d 2 {3, 11} b 2 c 1 d 1 + {11} a 3 b 2 c 1 d 1 {2, 4, 6, 8, 10, 12} c 2 {2, 4} a 1 c 2 {6, 8} a 2 c 2 d 1 {10, 12} a 3 c 2 d 1 {2, 6, 10} b 1 c 2 {2} a 1 b 1 c 2 d 2 Most general consistent concepts c i V Class {6} a 2 b 1 c 2 d 1 Extension Intention {10} a 3 b 1 c 2 d 1 {1, 2} a 1 b 1 d 2 - {4, 8, 12} b 2 c 2 d 1 {5, 6} a 2 b 1 - {4} a 1 b 2 c 2 d 1 {5, 7} a 2 c 1 d 2 - {8} a 2 b 2 c 2 d 1 {2, 6, 10} b 1 c 2 - {12} a 3 b 2 c 2 d 1 {9, 11} a 3 c 1 + {3, 4, 6, 8, 10, 11, 12} d 1 {3, 4, 8, 11, 12} b 2 d 1 + {10, 11, 12} a 3 d 1 {6, 10} b 1 c 2 d 1 {3, 4, 8, 11, 12} b 2 d 1 {3, 11} b 2 c 1 d 1 {4, 6, 8, 10, 12} c 2 d 1 {1, 2, 5, 7, 9} d 2 {1, 2, 5, 9} b 1 d 2 {1, 5, 7, 9} c 1 d 2 Table 3. Compare the, CDPs with the most general consistent concepts of Table 1 CDPs CDPs Most general consistent concepts c i V class ({1, 2}, a 1 b 1 ) ({1, 2}, a 1 b 1 ) ({1, 2}, a 1 b 1 d 2 ) - ({5, 6}, a 2 b 1 ) ({5, 6}, a 2 b 1 ) ({5, 6}, a 2 b 1 ) ({10}, a 3 b 1 c 2 ) ({6, 10}, b 1 d 1 ) ({2, 6, 10}, b 1 c 2 ) ({2, 6, 10}, b 1 c 2 ) ({7}, b 2 d 2 ) ({5, 7}, a 2 c 1 ) ({5, 7}, a 2 c 1 d 2 ) ({5, 7}, a 2 d 2 ) ({9}, a 3 b 1 c 1 ) ({9}, a 3 d 2 ) ({9, 11}, a 3 c 1 ) + ({9, 11}, a 3 c 1 ) ({3, 4, 8, 11, 12}, b 2 d 1 ) ({3, 4}, a 1 b 2 ) ({3, 4, 8, 11, 12}, b 2 d 1 ) ({11, 12}, a 3 b 2 ) ({4, 8, 12}, b 2 c 2 ) ({3, 4, 8, 11, 12}, b 2 d 1 )

10 Input: a decision table Output: a set of consistent classification rules For each c i V class, do the following: 1. Calculate the probability of p(c i φ) of the class c i given each atomic formula φ F. 2. Select the first φ t for which p(c i φ t) is the maximum. Create a subset of the training set comprising all the instances which contain the selected φ t. 3. Repeat Steps 1 and 2 until the local p(c i φ t) reaches 1 or stop if no more subsets can be extracted. At this time, check if there is any other condition φ s such that the local p(c i φ s ) also reaches Remove all the objects covered by the rule(s) from the table. 5. Repeat Step 1-4 until all the objects of class c i have been removed. Fig. 1. The algorithm Input: a decision table Output: a set of the most general consistent concepts 1. Apply the algorithm to generate a set of consistent classification rules: {φ c i c i V class }. 2. Construct a CDP for each consistent rule: {(φ, φ) c i}. 3. Construct a conjunctively consistent concept for each CDP: {(φ, φ ) c i }. 4. For each conjunctively consistent concept (φ, φ ), if there exists another conjunctively consistent concept (φ t, φ t ) such that φ φ t, then (φ, φ ) is not a most general consistent concepts, and is eliminated. Fig. 2. -concept: An algorithm for finding the most general consistent concepts Figure 2 describes the procedure of finding a set of the most general consistent concepts based on the algorithm. This algorithm is thus called the -concept algorithm. -concept has a higher computational complexity than because of the concept construction process. It prunes the rule set to the most kernel by considering the subset relation of concept extensions. The set of the most general consistent concepts cannot be more simplified, as a result of its extension cannot be bigger, and its intension cannot be coarser. 4.5 Experiments In order to evaluate the proposed -concept algorithm, we choose three sample datasets from UCI machine learning repository [10], and use SGI s MLC++ utilities 2.0 to generate categorical data [9]. We use 5-cross validation to divide training sets and testing sets, upon which the partition-based, the coveringbased, and the LCA-based -concept are tested for comparison. We keep track the number of rules, accuracy of both description and prediction for three datasets. The experimental results are reported in Figure 3. The number of -concept rules is between which of and, due to the difference between the partition-based and the covering-based algo-

11 rithms, and the nature of generalization capability of the concept-based rules. Since the -concepts are generated from a set of rules, and -concept have the same description accuracy, and it is normally higher than what can reach. This indicates that for the purpose of keeping a higher description accuracy, we can greatly simplify the set of covering-based consistent rules by using a set of the most general consistent concept-rules. The prediction accuracy of -concept rules is also between which of and. That is because has the greatest number of rules. That makes it more flexible to do testing, especially when error or missing records happen in testing datasets. As a fact that the intension of a most general consistent concept is the conjunction of all properties possessed by the extension, thus, it precisely describes the given training set, even might overfit the training set. Overfitting rules are good for description, but not good for testing. Cleve 1 st 2 nd 3 rd 4 th 5 th 303 records (training/testing: 243/60) 13 conditional attributes 2 decision classes # of rules Acc. of description. (%) Acc. of prediction (%) Vote 1 st 2 nd 3 rd 4 th 5 th 435 records (training/testing: 348/87) 16 conditional attributes 2 decision class # of rules Acc. of description (%) Acc. of prediction (%) Iris 1 st 2 nd 3 rd 4 th 5 th 150 records (training/testing: 120/30) 4 conditional attributes 3 decision class # of rules Acc. of description (%) Acc. of prediction (%) concept concept concept concept concept concept concept concept concept concept concept concept concept concept concept Fig. 3. Compare, and -concept on three sample datasets Although we use the consistent classification tasks (i.e., φ c i ) as an example going through the paper, it does not mean that conjunctive concepts cannot cope with approximation classification tasks in general (i.e., conf(φ c i ) 1). Suppose conf(φ c i ) = α < 1, where c i V class is the class label satisfying the majority of the object set φ. If φ is a conjunctor, then the CDP (φ, φ) has conf((φ, φ) c i ) = α; and the conjunctive concept (φ, φ ) has conf((φ, φ ) c i ) = α, which is not a consistent concept. However, a super concept of (φ, φ ), denoted as (φ, φ ), might or might not indicate the same class label c i. If conf( (φ, φ ) c i ) is satisfiable to the user, then the sub-concept (φ, φ ) can be pruned. 5 Conclusion Logical concept analysis provides an alternative way to study classification tasks. For consistent classification problems, a consistent classification rule corresponds

12 to a conjunctively definable pair, each conjunctively definable pair corresponds to a conjunctively consistent concept, and each conjunctively consistent concept corresponds to a most general consistent concept. All the most general consistent concepts form a special set of consistent classification rules, which can describe the given universe precisely and concisely by its nature. There are two approaches to find the set of the most general consistent concepts. One is from the concept lattice. The other is from a set of heuristic covering-based rules. The study shows that these two approaches can find a unique and complete set of the most general consistent concepts. Logical concept analysis can also be applied for probabilistic classification problems. Though, to generalize a (inconsistent) concept to its super concept, in order to simplify the concept rule set, one needs to use more heuristics and thresholds. That is a research topic we set up for the next stage. References 1. Cendrowska, J., : an algorithm for inducing modular rules, International Journal of Man-Machine Studies, 27, , Clark, P. and Matwin, S., Using qualitative models to guide induction learning, Proceedings of International Conference on Machine Learning, 49-56, Demri, S. and Orlowska, E., Logical analysis of indiscernibility, in: Incomplete Information: Rough Set Analysis, Orlowska, E. (Ed.), Physica-Verlag, Heidelberg, , Ferré, S. and Ridoux, O., A logical generalization of formal concept analysis, Proceedings of International Conference on Conceptual Structures, , Khan, M., Ding, Q., and Perrizo, W., K-nearest neighbor classification on spatial data streams using P-trees, Proceedings of PAKDD 02, , Mitchell, T., Version Spaces: An Approach to Concept Learning, PhD thesis, Computer Science Department, Stanford University, Stanford, California, Pawlak, Z., Rough Sets: Theoretical Aspects of Reasoning about Data, Kluwer Academic Publishers, Dordrecht, Quinlan, J.R., Learning efficient classification procedures and their application to chess end-games, in: Machine Learning: An Artificial Intelligence Approach, 1, , SGI s MLC++ utilities 2.0: the discretize utility UCI Machine Learning Repository. mlearn/mlrepository.html 11. Wille, R., Concept lattices and conceptual knowledge systems, Computers Mathematics with Applications, 23, , Yao, Y.Y., On Modeling data mining with granular computing, Proceedings of COMPSAC 01, , Yao, Y.Y. and Yao, J.T., Granular computing as a basis for consistent classification problems, Special Issue of PAKDD 02 Workshop on Toward the Foundation of Data Mining), 5(2), , Zhao, Y. and Yao, Y.Y., Interactive user-driven classification using a granule network, Proceedings of the Fifth International Conference of Cognitive Informatics (ICCI05), , 2005.

Foundations of Classification

Foundations of Classification Foundations of Classification J. T. Yao Y. Y. Yao and Y. Zhao Department of Computer Science, University of Regina Regina, Saskatchewan, Canada S4S 0A2 {jtyao, yyao, yanzhao}@cs.uregina.ca Summary. Classification

More information

Interpreting Low and High Order Rules: A Granular Computing Approach

Interpreting Low and High Order Rules: A Granular Computing Approach Interpreting Low and High Order Rules: A Granular Computing Approach Yiyu Yao, Bing Zhou and Yaohua Chen Department of Computer Science, University of Regina Regina, Saskatchewan, Canada S4S 0A2 E-mail:

More information

Concept Lattices in Rough Set Theory

Concept Lattices in Rough Set Theory Concept Lattices in Rough Set Theory Y.Y. Yao Department of Computer Science, University of Regina Regina, Saskatchewan, Canada S4S 0A2 E-mail: yyao@cs.uregina.ca URL: http://www.cs.uregina/ yyao Abstract

More information

Level Construction of Decision Trees in a Partition-based Framework for Classification

Level Construction of Decision Trees in a Partition-based Framework for Classification Level Construction of Decision Trees in a Partition-base Framework for Classification Y.Y. Yao, Y. Zhao an J.T. Yao Department of Computer Science, University of Regina Regina, Saskatchewan, Canaa S4S

More information

A Generalized Decision Logic in Interval-set-valued Information Tables

A Generalized Decision Logic in Interval-set-valued Information Tables A Generalized Decision Logic in Interval-set-valued Information Tables Y.Y. Yao 1 and Qing Liu 2 1 Department of Computer Science, University of Regina Regina, Saskatchewan, Canada S4S 0A2 E-mail: yyao@cs.uregina.ca

More information

Naive Bayesian Rough Sets

Naive Bayesian Rough Sets Naive Bayesian Rough Sets Yiyu Yao and Bing Zhou Department of Computer Science, University of Regina Regina, Saskatchewan, Canada S4S 0A2 {yyao,zhou200b}@cs.uregina.ca Abstract. A naive Bayesian classifier

More information

A new Approach to Drawing Conclusions from Data A Rough Set Perspective

A new Approach to Drawing Conclusions from Data A Rough Set Perspective Motto: Let the data speak for themselves R.A. Fisher A new Approach to Drawing Conclusions from Data A Rough et Perspective Zdzisław Pawlak Institute for Theoretical and Applied Informatics Polish Academy

More information

Selected Algorithms of Machine Learning from Examples

Selected Algorithms of Machine Learning from Examples Fundamenta Informaticae 18 (1993), 193 207 Selected Algorithms of Machine Learning from Examples Jerzy W. GRZYMALA-BUSSE Department of Computer Science, University of Kansas Lawrence, KS 66045, U. S. A.

More information

REDUCTS AND ROUGH SET ANALYSIS

REDUCTS AND ROUGH SET ANALYSIS REDUCTS AND ROUGH SET ANALYSIS A THESIS SUBMITTED TO THE FACULTY OF GRADUATE STUDIES AND RESEARCH IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASTER OF SCIENCE IN COMPUTER SCIENCE UNIVERSITY

More information

A Logical Formulation of the Granular Data Model

A Logical Formulation of the Granular Data Model 2008 IEEE International Conference on Data Mining Workshops A Logical Formulation of the Granular Data Model Tuan-Fang Fan Department of Computer Science and Information Engineering National Penghu University

More information

Minimal Attribute Space Bias for Attribute Reduction

Minimal Attribute Space Bias for Attribute Reduction Minimal Attribute Space Bias for Attribute Reduction Fan Min, Xianghui Du, Hang Qiu, and Qihe Liu School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu

More information

Notes on Machine Learning for and

Notes on Machine Learning for and Notes on Machine Learning for 16.410 and 16.413 (Notes adapted from Tom Mitchell and Andrew Moore.) Learning = improving with experience Improve over task T (e.g, Classification, control tasks) with respect

More information

Holdout and Cross-Validation Methods Overfitting Avoidance

Holdout and Cross-Validation Methods Overfitting Avoidance Holdout and Cross-Validation Methods Overfitting Avoidance Decision Trees Reduce error pruning Cost-complexity pruning Neural Networks Early stopping Adjusting Regularizers via Cross-Validation Nearest

More information

EECS 349:Machine Learning Bryan Pardo

EECS 349:Machine Learning Bryan Pardo EECS 349:Machine Learning Bryan Pardo Topic 2: Decision Trees (Includes content provided by: Russel & Norvig, D. Downie, P. Domingos) 1 General Learning Task There is a set of possible examples Each example

More information

Index. Cambridge University Press Relational Knowledge Discovery M E Müller. Index. More information

Index. Cambridge University Press Relational Knowledge Discovery M E Müller. Index. More information s/r. See quotient, 93 R, 122 [x] R. See class, equivalence [[P Q]]s, 142 =, 173, 164 A α, 162, 178, 179 =, 163, 193 σ RES, 166, 22, 174 Ɣ, 178, 179, 175, 176, 179 i, 191, 172, 21, 26, 29 χ R. See rough

More information

Notes on Rough Set Approximations and Associated Measures

Notes on Rough Set Approximations and Associated Measures Notes on Rough Set Approximations and Associated Measures Yiyu Yao Department of Computer Science, University of Regina Regina, Saskatchewan, Canada S4S 0A2 E-mail: yyao@cs.uregina.ca URL: http://www.cs.uregina.ca/

More information

ML techniques. symbolic techniques different types of representation value attribute representation representation of the first order

ML techniques. symbolic techniques different types of representation value attribute representation representation of the first order MACHINE LEARNING Definition 1: Learning is constructing or modifying representations of what is being experienced [Michalski 1986], p. 10 Definition 2: Learning denotes changes in the system That are adaptive

More information

Index. C, system, 8 Cech distance, 549

Index. C, system, 8 Cech distance, 549 Index PF(A), 391 α-lower approximation, 340 α-lower bound, 339 α-reduct, 109 α-upper approximation, 340 α-upper bound, 339 δ-neighborhood consistent, 291 ε-approach nearness, 558 C, 443-2 system, 8 Cech

More information

CSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18

CSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18 CSE 417T: Introduction to Machine Learning Final Review Henry Chai 12/4/18 Overfitting Overfitting is fitting the training data more than is warranted Fitting noise rather than signal 2 Estimating! "#$

More information

Drawing Conclusions from Data The Rough Set Way

Drawing Conclusions from Data The Rough Set Way Drawing Conclusions from Data The Rough et Way Zdzisław Pawlak Institute of Theoretical and Applied Informatics, Polish Academy of ciences, ul Bałtycka 5, 44 000 Gliwice, Poland In the rough set theory

More information

Banacha Warszawa Poland s:

Banacha Warszawa Poland  s: Chapter 12 Rough Sets and Rough Logic: A KDD Perspective Zdzis law Pawlak 1, Lech Polkowski 2, and Andrzej Skowron 3 1 Institute of Theoretical and Applied Informatics Polish Academy of Sciences Ba ltycka

More information

Rough Sets for Uncertainty Reasoning

Rough Sets for Uncertainty Reasoning Rough Sets for Uncertainty Reasoning S.K.M. Wong 1 and C.J. Butz 2 1 Department of Computer Science, University of Regina, Regina, Canada, S4S 0A2, wong@cs.uregina.ca 2 School of Information Technology

More information

On Improving the k-means Algorithm to Classify Unclassified Patterns

On Improving the k-means Algorithm to Classify Unclassified Patterns On Improving the k-means Algorithm to Classify Unclassified Patterns Mohamed M. Rizk 1, Safar Mohamed Safar Alghamdi 2 1 Mathematics & Statistics Department, Faculty of Science, Taif University, Taif,

More information

Classification of Voice Signals through Mining Unique Episodes in Temporal Information Systems: A Rough Set Approach

Classification of Voice Signals through Mining Unique Episodes in Temporal Information Systems: A Rough Set Approach Classification of Voice Signals through Mining Unique Episodes in Temporal Information Systems: A Rough Set Approach Krzysztof Pancerz, Wies law Paja, Mariusz Wrzesień, and Jan Warcho l 1 University of

More information

Modern Information Retrieval

Modern Information Retrieval Modern Information Retrieval Chapter 8 Text Classification Introduction A Characterization of Text Classification Unsupervised Algorithms Supervised Algorithms Feature Selection or Dimensionality Reduction

More information

Classification Using Decision Trees

Classification Using Decision Trees Classification Using Decision Trees 1. Introduction Data mining term is mainly used for the specific set of six activities namely Classification, Estimation, Prediction, Affinity grouping or Association

More information

Exploring Spatial Relationships for Knowledge Discovery in Spatial Data

Exploring Spatial Relationships for Knowledge Discovery in Spatial Data 2009 International Conference on Computer Engineering and Applications IPCSIT vol.2 (2011) (2011) IACSIT Press, Singapore Exploring Spatial Relationships for Knowledge Discovery in Spatial Norazwin Buang

More information

Easy Categorization of Attributes in Decision Tables Based on Basic Binary Discernibility Matrix

Easy Categorization of Attributes in Decision Tables Based on Basic Binary Discernibility Matrix Easy Categorization of Attributes in Decision Tables Based on Basic Binary Discernibility Matrix Manuel S. Lazo-Cortés 1, José Francisco Martínez-Trinidad 1, Jesús Ariel Carrasco-Ochoa 1, and Guillermo

More information

Chapter 6: Classification

Chapter 6: Classification Chapter 6: Classification 1) Introduction Classification problem, evaluation of classifiers, prediction 2) Bayesian Classifiers Bayes classifier, naive Bayes classifier, applications 3) Linear discriminant

More information

Decision Tree Learning

Decision Tree Learning Decision Tree Learning Berlin Chen Department of Computer Science & Information Engineering National Taiwan Normal University References: 1. Machine Learning, Chapter 3 2. Data Mining: Concepts, Models,

More information

CS 6375 Machine Learning

CS 6375 Machine Learning CS 6375 Machine Learning Decision Trees Instructor: Yang Liu 1 Supervised Classifier X 1 X 2. X M Ref class label 2 1 Three variables: Attribute 1: Hair = {blond, dark} Attribute 2: Height = {tall, short}

More information

ARPN Journal of Science and Technology All rights reserved.

ARPN Journal of Science and Technology All rights reserved. Rule Induction Based On Boundary Region Partition Reduction with Stards Comparisons Du Weifeng Min Xiao School of Mathematics Physics Information Engineering Jiaxing University Jiaxing 34 China ABSTRACT

More information

CHAPTER-17. Decision Tree Induction

CHAPTER-17. Decision Tree Induction CHAPTER-17 Decision Tree Induction 17.1 Introduction 17.2 Attribute selection measure 17.3 Tree Pruning 17.4 Extracting Classification Rules from Decision Trees 17.5 Bayesian Classification 17.6 Bayes

More information

Machine Learning 2010

Machine Learning 2010 Machine Learning 2010 Decision Trees Email: mrichter@ucalgary.ca -- 1 - Part 1 General -- 2 - Representation with Decision Trees (1) Examples are attribute-value vectors Representation of concepts by labeled

More information

Feature Selection with Fuzzy Decision Reducts

Feature Selection with Fuzzy Decision Reducts Feature Selection with Fuzzy Decision Reducts Chris Cornelis 1, Germán Hurtado Martín 1,2, Richard Jensen 3, and Dominik Ślȩzak4 1 Dept. of Mathematics and Computer Science, Ghent University, Gent, Belgium

More information

Unifying Version Space Representations: Part II

Unifying Version Space Representations: Part II Unifying Version Space Representations: Part II E.N. Smirnov, I.G. Sprinkhuizen-Kuyper, and H.J. van den Herik IKAT, Department of Computer Science, Maastricht University, P.O.Box 616, 6200 MD Maastricht,

More information

Lecture 7: DecisionTrees

Lecture 7: DecisionTrees Lecture 7: DecisionTrees What are decision trees? Brief interlude on information theory Decision tree construction Overfitting avoidance Regression trees COMP-652, Lecture 7 - September 28, 2009 1 Recall:

More information

Rough Set Model Selection for Practical Decision Making

Rough Set Model Selection for Practical Decision Making Rough Set Model Selection for Practical Decision Making Joseph P. Herbert JingTao Yao Department of Computer Science University of Regina Regina, Saskatchewan, Canada, S4S 0A2 {herbertj, jtyao}@cs.uregina.ca

More information

Geometric View of Machine Learning Nearest Neighbor Classification. Slides adapted from Prof. Carpuat

Geometric View of Machine Learning Nearest Neighbor Classification. Slides adapted from Prof. Carpuat Geometric View of Machine Learning Nearest Neighbor Classification Slides adapted from Prof. Carpuat What we know so far Decision Trees What is a decision tree, and how to induce it from data Fundamental

More information

Lecture 3: Decision Trees

Lecture 3: Decision Trees Lecture 3: Decision Trees Cognitive Systems II - Machine Learning SS 2005 Part I: Basic Approaches of Concept Learning ID3, Information Gain, Overfitting, Pruning Lecture 3: Decision Trees p. Decision

More information

Machine Learning on temporal data

Machine Learning on temporal data Machine Learning on temporal data Classification rees for ime Series Ahlame Douzal (Ahlame.Douzal@imag.fr) AMA, LIG, Université Joseph Fourier Master 2R - MOSIG (2011) Plan ime Series classification approaches

More information

Lecture 25 of 42. PAC Learning, VC Dimension, and Mistake Bounds

Lecture 25 of 42. PAC Learning, VC Dimension, and Mistake Bounds Lecture 25 of 42 PAC Learning, VC Dimension, and Mistake Bounds Thursday, 15 March 2007 William H. Hsu, KSU http://www.kddresearch.org/courses/spring2007/cis732 Readings: Sections 7.4.17.4.3, 7.5.17.5.3,

More information

A Rough Set Interpretation of User s Web Behavior: A Comparison with Information Theoretic Measure

A Rough Set Interpretation of User s Web Behavior: A Comparison with Information Theoretic Measure A Rough et Interpretation of User s Web Behavior: A Comparison with Information Theoretic easure George V. eghabghab Roane tate Dept of Computer cience Technology Oak Ridge, TN, 37830 gmeghab@hotmail.com

More information

ENTROPIES OF FUZZY INDISCERNIBILITY RELATION AND ITS OPERATIONS

ENTROPIES OF FUZZY INDISCERNIBILITY RELATION AND ITS OPERATIONS International Journal of Uncertainty Fuzziness and Knowledge-Based Systems World Scientific ublishing Company ENTOIES OF FUZZY INDISCENIBILITY ELATION AND ITS OEATIONS QINGUA U and DAEN YU arbin Institute

More information

Machine Learning & Data Mining

Machine Learning & Data Mining Group M L D Machine Learning M & Data Mining Chapter 7 Decision Trees Xin-Shun Xu @ SDU School of Computer Science and Technology, Shandong University Top 10 Algorithm in DM #1: C4.5 #2: K-Means #3: SVM

More information

Lecture 3: Decision Trees

Lecture 3: Decision Trees Lecture 3: Decision Trees Cognitive Systems - Machine Learning Part I: Basic Approaches of Concept Learning ID3, Information Gain, Overfitting, Pruning last change November 26, 2014 Ute Schmid (CogSys,

More information

Comparison of Rough-set and Interval-set Models for Uncertain Reasoning

Comparison of Rough-set and Interval-set Models for Uncertain Reasoning Yao, Y.Y. and Li, X. Comparison of rough-set and interval-set models for uncertain reasoning Fundamenta Informaticae, Vol. 27, No. 2-3, pp. 289-298, 1996. Comparison of Rough-set and Interval-set Models

More information

CMSC 422 Introduction to Machine Learning Lecture 4 Geometry and Nearest Neighbors. Furong Huang /

CMSC 422 Introduction to Machine Learning Lecture 4 Geometry and Nearest Neighbors. Furong Huang / CMSC 422 Introduction to Machine Learning Lecture 4 Geometry and Nearest Neighbors Furong Huang / furongh@cs.umd.edu What we know so far Decision Trees What is a decision tree, and how to induce it from

More information

Text Mining. Dr. Yanjun Li. Associate Professor. Department of Computer and Information Sciences Fordham University

Text Mining. Dr. Yanjun Li. Associate Professor. Department of Computer and Information Sciences Fordham University Text Mining Dr. Yanjun Li Associate Professor Department of Computer and Information Sciences Fordham University Outline Introduction: Data Mining Part One: Text Mining Part Two: Preprocessing Text Data

More information

Mining Approximative Descriptions of Sets Using Rough Sets

Mining Approximative Descriptions of Sets Using Rough Sets Mining Approximative Descriptions of Sets Using Rough Sets Dan A. Simovici University of Massachusetts Boston, Dept. of Computer Science, 100 Morrissey Blvd. Boston, Massachusetts, 02125 USA dsim@cs.umb.edu

More information

Decision Trees. Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University. February 5 th, Carlos Guestrin 1

Decision Trees. Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University. February 5 th, Carlos Guestrin 1 Decision Trees Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University February 5 th, 2007 2005-2007 Carlos Guestrin 1 Linear separability A dataset is linearly separable iff 9 a separating

More information

CS6375: Machine Learning Gautam Kunapuli. Decision Trees

CS6375: Machine Learning Gautam Kunapuli. Decision Trees Gautam Kunapuli Example: Restaurant Recommendation Example: Develop a model to recommend restaurants to users depending on their past dining experiences. Here, the features are cost (x ) and the user s

More information

Induction of Decision Trees

Induction of Decision Trees Induction of Decision Trees Peter Waiganjo Wagacha This notes are for ICS320 Foundations of Learning and Adaptive Systems Institute of Computer Science University of Nairobi PO Box 30197, 00200 Nairobi.

More information

Rough operations on Boolean algebras

Rough operations on Boolean algebras Rough operations on Boolean algebras Guilin Qi and Weiru Liu School of Computer Science, Queen s University Belfast Belfast, BT7 1NN, UK Abstract In this paper, we introduce two pairs of rough operations

More information

Data Mining Classification: Basic Concepts and Techniques. Lecture Notes for Chapter 3. Introduction to Data Mining, 2nd Edition

Data Mining Classification: Basic Concepts and Techniques. Lecture Notes for Chapter 3. Introduction to Data Mining, 2nd Edition Data Mining Classification: Basic Concepts and Techniques Lecture Notes for Chapter 3 by Tan, Steinbach, Karpatne, Kumar 1 Classification: Definition Given a collection of records (training set ) Each

More information

An Empirical Study of Building Compact Ensembles

An Empirical Study of Building Compact Ensembles An Empirical Study of Building Compact Ensembles Huan Liu, Amit Mandvikar, and Jigar Mody Computer Science & Engineering Arizona State University Tempe, AZ 85281 {huan.liu,amitm,jigar.mody}@asu.edu Abstract.

More information

Discovering Unknown Unknowns of Predictive Models

Discovering Unknown Unknowns of Predictive Models Discovering Unknown Unknowns of Predictive Models Himabindu Lakkaraju Stanford University himalv@cs.stanford.edu Rich Caruana rcaruana@microsoft.com Ece Kamar eckamar@microsoft.com Eric Horvitz horvitz@microsoft.com

More information

Home Page. Title Page. Page 1 of 35. Go Back. Full Screen. Close. Quit

Home Page. Title Page. Page 1 of 35. Go Back. Full Screen. Close. Quit JJ II J I Page 1 of 35 General Attribute Reduction of Formal Contexts Tong-Jun Li Zhejiang Ocean University, China litj@zjou.edu.cn September, 2011,University of Milano-Bicocca Page 2 of 35 Objective of

More information

Qualifying Exam in Machine Learning

Qualifying Exam in Machine Learning Qualifying Exam in Machine Learning October 20, 2009 Instructions: Answer two out of the three questions in Part 1. In addition, answer two out of three questions in two additional parts (choose two parts

More information

Decision Tree Analysis for Classification Problems. Entscheidungsunterstützungssysteme SS 18

Decision Tree Analysis for Classification Problems. Entscheidungsunterstützungssysteme SS 18 Decision Tree Analysis for Classification Problems Entscheidungsunterstützungssysteme SS 18 Supervised segmentation An intuitive way of thinking about extracting patterns from data in a supervised manner

More information

Machine Learning 3. week

Machine Learning 3. week Machine Learning 3. week Entropy Decision Trees ID3 C4.5 Classification and Regression Trees (CART) 1 What is Decision Tree As a short description, decision tree is a data classification procedure which

More information

Lattice Machine: Version Space in Hyperrelations

Lattice Machine: Version Space in Hyperrelations Lattice Machine: Version Space in Hyperrelations [Extended Abstract] Hui Wang, Ivo Düntsch School of Information and Software Engineering University of Ulster Newtownabbey, BT 37 0QB, N.Ireland {H.Wang

More information

Jeffrey D. Ullman Stanford University

Jeffrey D. Ullman Stanford University Jeffrey D. Ullman Stanford University 3 We are given a set of training examples, consisting of input-output pairs (x,y), where: 1. x is an item of the type we want to evaluate. 2. y is the value of some

More information

APPLICATION FOR LOGICAL EXPRESSION PROCESSING

APPLICATION FOR LOGICAL EXPRESSION PROCESSING APPLICATION FOR LOGICAL EXPRESSION PROCESSING Marcin Michalak, Michał Dubiel, Jolanta Urbanek Institute of Informatics, Silesian University of Technology, Gliwice, Poland Marcin.Michalak@polsl.pl ABSTRACT

More information

CS145: INTRODUCTION TO DATA MINING

CS145: INTRODUCTION TO DATA MINING CS145: INTRODUCTION TO DATA MINING 4: Vector Data: Decision Tree Instructor: Yizhou Sun yzsun@cs.ucla.edu October 10, 2017 Methods to Learn Vector Data Set Data Sequence Data Text Data Classification Clustering

More information

Decision Trees. Introduction. Some facts about decision trees: They represent data-classification models.

Decision Trees. Introduction. Some facts about decision trees: They represent data-classification models. Decision Trees Introduction Some facts about decision trees: They represent data-classification models. An internal node of the tree represents a question about feature-vector attribute, and whose answer

More information

CSCI 5622 Machine Learning

CSCI 5622 Machine Learning CSCI 5622 Machine Learning DATE READ DUE Mon, Aug 31 1, 2 & 3 Wed, Sept 2 3 & 5 Wed, Sept 9 TBA Prelim Proposal www.rodneynielsen.com/teaching/csci5622f09/ Instructor: Rodney Nielsen Assistant Professor

More information

Data Mining and Machine Learning (Machine Learning: Symbolische Ansätze)

Data Mining and Machine Learning (Machine Learning: Symbolische Ansätze) Data Mining and Machine Learning (Machine Learning: Symbolische Ansätze) Learning Individual Rules and Subgroup Discovery Introduction Batch Learning Terminology Coverage Spaces Descriptive vs. Predictive

More information

Action Rule Extraction From A Decision Table : ARED

Action Rule Extraction From A Decision Table : ARED Action Rule Extraction From A Decision Table : ARED Seunghyun Im 1 and Zbigniew Ras 2,3 1 University of Pittsburgh at Johnstown, Department of Computer Science Johnstown, PA. 15904, USA 2 University of

More information

Application of Rough Set Theory in Performance Analysis

Application of Rough Set Theory in Performance Analysis Australian Journal of Basic and Applied Sciences, 6(): 158-16, 1 SSN 1991-818 Application of Rough Set Theory in erformance Analysis 1 Mahnaz Mirbolouki, Mohammad Hassan Behzadi, 1 Leila Karamali 1 Department

More information

COMS 4771 Introduction to Machine Learning. Nakul Verma

COMS 4771 Introduction to Machine Learning. Nakul Verma COMS 4771 Introduction to Machine Learning Nakul Verma Announcements HW1 due next lecture Project details are available decide on the group and topic by Thursday Last time Generative vs. Discriminative

More information

Research on Complete Algorithms for Minimal Attribute Reduction

Research on Complete Algorithms for Minimal Attribute Reduction Research on Complete Algorithms for Minimal Attribute Reduction Jie Zhou, Duoqian Miao, Qinrong Feng, and Lijun Sun Department of Computer Science and Technology, Tongji University Shanghai, P.R. China,

More information

Issues in Modeling for Data Mining

Issues in Modeling for Data Mining Issues in Modeling for Data Mining Tsau Young (T.Y.) Lin Department of Mathematics and Computer Science San Jose State University San Jose, CA 95192 tylin@cs.sjsu.edu ABSTRACT Modeling in data mining has

More information

WEIGHTS OF TESTS Vesela Angelova

WEIGHTS OF TESTS Vesela Angelova International Journal "Information Models and Analyses" Vol.1 / 2012 193 WEIGHTS OF TESTS Vesela Angelova Abstract: Terminal test is subset of features in training table that is enough to distinguish objects

More information

Data Mining. 3.6 Regression Analysis. Fall Instructor: Dr. Masoud Yaghini. Numeric Prediction

Data Mining. 3.6 Regression Analysis. Fall Instructor: Dr. Masoud Yaghini. Numeric Prediction Data Mining 3.6 Regression Analysis Fall 2008 Instructor: Dr. Masoud Yaghini Outline Introduction Straight-Line Linear Regression Multiple Linear Regression Other Regression Models References Introduction

More information

2 WANG Jue, CUI Jia et al. Vol.16 no", the discernibility matrix is only a new kind of learning method. Otherwise, we have to provide the specificatio

2 WANG Jue, CUI Jia et al. Vol.16 no, the discernibility matrix is only a new kind of learning method. Otherwise, we have to provide the specificatio Vol.16 No.1 J. Comput. Sci. & Technol. Jan. 2001 Investigation on AQ11, ID3 and the Principle of Discernibility Matrix WANG Jue (Ξ ±), CUI Jia ( ) and ZHAO Kai (Π Λ) Institute of Automation, The Chinese

More information

Data Mining and Machine Learning

Data Mining and Machine Learning Data Mining and Machine Learning Concept Learning and Version Spaces Introduction Concept Learning Generality Relations Refinement Operators Structured Hypothesis Spaces Simple algorithms Find-S Find-G

More information

ARTICLE IN PRESS. Information Sciences xxx (2016) xxx xxx. Contents lists available at ScienceDirect. Information Sciences

ARTICLE IN PRESS. Information Sciences xxx (2016) xxx xxx. Contents lists available at ScienceDirect. Information Sciences Information Sciences xxx (2016) xxx xxx Contents lists available at ScienceDirect Information Sciences journal homepage: www.elsevier.com/locate/ins Three-way cognitive concept learning via multi-granularity

More information

Describing Data Table with Best Decision

Describing Data Table with Best Decision Describing Data Table with Best Decision ANTS TORIM, REIN KUUSIK Department of Informatics Tallinn University of Technology Raja 15, 12618 Tallinn ESTONIA torim@staff.ttu.ee kuusik@cc.ttu.ee http://staff.ttu.ee/~torim

More information

Learning from Examples

Learning from Examples Learning from Examples Data fitting Decision trees Cross validation Computational learning theory Linear classifiers Neural networks Nonparametric methods: nearest neighbor Support vector machines Ensemble

More information

Machine Learning, 6, (1991) 1991 Kluwer Academic Publishers, Boston. Manufactured in The Netherlands.

Machine Learning, 6, (1991) 1991 Kluwer Academic Publishers, Boston. Manufactured in The Netherlands. Machine Learning, 6, 81-92 (1991) 1991 Kluwer Academic Publishers, Boston. Manufactured in The Netherlands. Technical Note A Distance-Based Attribute Selection Measure for Decision Tree Induction R. LOPEZ

More information

Machine Learning Alternatives to Manual Knowledge Acquisition

Machine Learning Alternatives to Manual Knowledge Acquisition Machine Learning Alternatives to Manual Knowledge Acquisition Interactive programs which elicit knowledge from the expert during the course of a conversation at the terminal. Programs which learn by scanning

More information

From statistics to data science. BAE 815 (Fall 2017) Dr. Zifei Liu

From statistics to data science. BAE 815 (Fall 2017) Dr. Zifei Liu From statistics to data science BAE 815 (Fall 2017) Dr. Zifei Liu Zifeiliu@ksu.edu Why? How? What? How much? How many? Individual facts (quantities, characters, or symbols) The Data-Information-Knowledge-Wisdom

More information

Three-Way Analysis of Facial Similarity Judgments

Three-Way Analysis of Facial Similarity Judgments Three-Way Analysis of Facial Similarity Judgments Daryl H. Hepting, Hadeel Hatim Bin Amer, and Yiyu Yao University of Regina, Regina, SK, S4S 0A2, CANADA hepting@cs.uregina.ca, binamerh@cs.uregina.ca,

More information

Supervised Learning! Algorithm Implementations! Inferring Rudimentary Rules and Decision Trees!

Supervised Learning! Algorithm Implementations! Inferring Rudimentary Rules and Decision Trees! Supervised Learning! Algorithm Implementations! Inferring Rudimentary Rules and Decision Trees! Summary! Input Knowledge representation! Preparing data for learning! Input: Concept, Instances, Attributes"

More information

A PRIMER ON ROUGH SETS:

A PRIMER ON ROUGH SETS: A PRIMER ON ROUGH SETS: A NEW APPROACH TO DRAWING CONCLUSIONS FROM DATA Zdzisław Pawlak ABSTRACT Rough set theory is a new mathematical approach to vague and uncertain data analysis. This Article explains

More information

Introduction to ML. Two examples of Learners: Naïve Bayesian Classifiers Decision Trees

Introduction to ML. Two examples of Learners: Naïve Bayesian Classifiers Decision Trees Introduction to ML Two examples of Learners: Naïve Bayesian Classifiers Decision Trees Why Bayesian learning? Probabilistic learning: Calculate explicit probabilities for hypothesis, among the most practical

More information

Data classification (II)

Data classification (II) Lecture 4: Data classification (II) Data Mining - Lecture 4 (2016) 1 Outline Decision trees Choice of the splitting attribute ID3 C4.5 Classification rules Covering algorithms Naïve Bayes Classification

More information

43400 Serdang Selangor, Malaysia Serdang Selangor, Malaysia 4

43400 Serdang Selangor, Malaysia Serdang Selangor, Malaysia 4 An Extended ID3 Decision Tree Algorithm for Spatial Data Imas Sukaesih Sitanggang # 1, Razali Yaakob #2, Norwati Mustapha #3, Ahmad Ainuddin B Nuruddin *4 # Faculty of Computer Science and Information

More information

Policies Generalization in Reinforcement Learning using Galois Partitions Lattices

Policies Generalization in Reinforcement Learning using Galois Partitions Lattices Policies Generalization in Reinforcement Learning using Galois Partitions Lattices Marc Ricordeau and Michel Liquière mricorde@wanadoo.fr, liquiere@lirmm.fr Laboratoire d Informatique, de Robotique et

More information

Machine Learning Recitation 8 Oct 21, Oznur Tastan

Machine Learning Recitation 8 Oct 21, Oznur Tastan Machine Learning 10601 Recitation 8 Oct 21, 2009 Oznur Tastan Outline Tree representation Brief information theory Learning decision trees Bagging Random forests Decision trees Non linear classifier Easy

More information

CLUe Training An Introduction to Machine Learning in R with an example from handwritten digit recognition

CLUe Training An Introduction to Machine Learning in R with an example from handwritten digit recognition CLUe Training An Introduction to Machine Learning in R with an example from handwritten digit recognition Ad Feelders Universiteit Utrecht Department of Information and Computing Sciences Algorithmic Data

More information

Decision Tree Learning

Decision Tree Learning Topics Decision Tree Learning Sattiraju Prabhakar CS898O: DTL Wichita State University What are decision trees? How do we use them? New Learning Task ID3 Algorithm Weka Demo C4.5 Algorithm Weka Demo Implementation

More information

AIBO experiences change of surface incline.

AIBO experiences change of surface incline. NW Computational Intelligence Laboratory AIBO experiences change of surface incline. Decisions based only on kinesthetic experience vector ICNC 07, China 8/27/07 # 51 NW Computational Intelligence Laboratory

More information

An algorithm for induction of decision rules consistent with the dominance principle

An algorithm for induction of decision rules consistent with the dominance principle An algorithm for induction of decision rules consistent with the dominance principle Salvatore Greco 1, Benedetto Matarazzo 1, Roman Slowinski 2, Jerzy Stefanowski 2 1 Faculty of Economics, University

More information

Decision Trees Entropy, Information Gain, Gain Ratio

Decision Trees Entropy, Information Gain, Gain Ratio Changelog: 14 Oct, 30 Oct Decision Trees Entropy, Information Gain, Gain Ratio Lecture 3: Part 2 Outline Entropy Information gain Gain ratio Marina Santini Acknowledgements Slides borrowed and adapted

More information

Modern Information Retrieval

Modern Information Retrieval Modern Information Retrieval Chapter 8 Text Classification Introduction A Characterization of Text Classification Unsupervised Algorithms Supervised Algorithms Feature Selection or Dimensionality Reduction

More information

10-701/ Machine Learning, Fall

10-701/ Machine Learning, Fall 0-70/5-78 Machine Learning, Fall 2003 Homework 2 Solution If you have questions, please contact Jiayong Zhang .. (Error Function) The sum-of-squares error is the most common training

More information

Semantic Rendering of Data Tables: Multivalued Information Systems Revisited

Semantic Rendering of Data Tables: Multivalued Information Systems Revisited Semantic Rendering of Data Tables: Multivalued Information Systems Revisited Marcin Wolski 1 and Anna Gomolińska 2 1 Maria Curie-Skłodowska University, Department of Logic and Cognitive Science, Pl. Marii

More information

CSE 352 (AI) LECTURE NOTES Professor Anita Wasilewska. NEURAL NETWORKS Learning

CSE 352 (AI) LECTURE NOTES Professor Anita Wasilewska. NEURAL NETWORKS Learning CSE 352 (AI) LECTURE NOTES Professor Anita Wasilewska NEURAL NETWORKS Learning Neural Networks Classifier Short Presentation INPUT: classification data, i.e. it contains an classification (class) attribute.

More information