Minimum Error Classification Clustering

Size: px
Start display at page:

Download "Minimum Error Classification Clustering"

Transcription

1 pp Minimum Error Classification Clustering Iwan Tri Riyadi Yanto Department of Mathematics University of Ahmad Dahlan Abstract Clustering is the problem of identifying the distribution of patterns and intrinsic correlations in large data sets by partitioning the data points into similarity classes. In this paper we study on the problem of clustering categorical data where data objects are made up of non-numerical attributes. We propose MECC (Minimum Error Classification Clustering) an alternative technique for categorical data clustering using VPRS taing into account minimum error classification. The technique is implemented in MATLA. Experimental results on two benchmar UCI datasets show that MECC technique is better than the baseline categorical data clustering techniques with respect to selecting the clustering attribute. Keywords: Clustering; Categorical data Rough set; VPRS Error Classification 1. Introduction The problem of clustering data arises in many disciplines and has a wide range of applications. Intuitively clustering is the problem of partitioning a finite set of points in a multi-dimensional space into classes (called clusters) so that (i) the points belonging to the same class are similar and (ii) the points belonging to different classes are dissimilar. The clustering problem has been studied extensively in machine learning databases and statistics from various perspectives and with various approaches and focuses. [1]. The clustering operation is required in a number of data analysis tass such as unsupervised classification and data summation as well as segmentation of large homogeneous data sets into smaller homogeneous subsets that can be easily managed separately modeled and analyzed [2]. In this paper we focus our attention on categorical datasets where data objects are made up of non-numerical attributes. For categorical data clustering a new trend has become in algorithms which can handle un-certainty in the clustering process. One of the well-nown techniques is based on rough set theory [3-5]. Mazlac proposed a technique called TR (Total Roughness). It is based on accuracy of approximation of a set [3] where the highest value is the best selection of attribute [6]. One of the successful pioneering rough clustering for categorical data techniques is Minimum-Minimum Roughness (MMR) proposed by Parmar et al. [7]. The algorithm for selecting a clustering attribute is based on the opposite of accuracy of approximation of a set [3]. To this TR and MMR possibly provide the same result on selecting a clustering attribute. However when the problem of the real data clustering is faced by noises data the data are always corrupted. So it is not feasible to deal with the noisy data with the classical definition of rough set as MMR fails to do for handling noisy data. There are drawbacs particularly losing more useful information for demanding the inclusion of the absolutely precision in the classical definition of rough set. In order to overcome the drawbac an error parameter where 0 < 0. 5 is introduced. Variable Precision Rough Set (VPRS) model proposed by Ziaro [8] is defined on the probabilistic space and will give us a new way to deal with the noisy data. It is an effective ISSN: IJSEIA Copyright c 2013 SERSC

2 mathematical tool with an error-tolerance capability to handle uncertainty problem. asically the VPRS is an extension of Pawlas rough set theory [3-5] allowing for partial classification. y setting a confidence threshold value the VPRS cannot only solve classification problems with uncertain data and no functional relationship between attribute but also relax the rigid boundary definition of Pawlas rough set model to improve the model suitability. Due to existence of the VPRS can resist data noise or remove data errors[9]. In order to determine a rational change interval for [10] it will give us a new way to deal with the noisy data[11]. Inspired VPRS for handling noisy data in this paper we propose an alternative technique for categorical data clustering that there are addresses above issue. For selecting the clustering attribute it is based on minimum error classification to get better accuracy of approximation. 2. Rough Set Theory 2.1. Information System and Set Approximations U An information system is a 4-tuple (quadruple) S ( U A V f ) = where A = a a a is = { u u u } is a non-empty finite set of objects { } u U a A a non-empty finite set of attributes a V = V V is the domain (value set) of attribute a f U A V u a U A called information (nowledge) function. : is an information function such that f ( u a) Va ( ) Definition 1. Two elements the set of attribute A a a a A for every x y U are said to be -indiscernible (indiscernible by f x a = f y a for every a. in S) if and only if ( ) ( ) Obviously every subset of A induces unique indiscernibility relation. Notice that an indiscernibility relation induced by the set of attribute denoted by IND () is an equivalence relation. The partition of U induced by IND () is denoted by U / and the equivalence class in the partition U / containing x U is denoted by[ x ]. The notions of lower and upper approximations of a set are defined as follows. Definition 2. The -lower approximation of denoted by ( ) approximations of denoted by ( ) respectively are defined by ( ) = { x U : [ x ] } ( ) = { x U : [ x ] U φ} The accuracy of approximation (accuracy of roughness) of any subset respect to A is measured by ( ) ( ) α ( ) = and -upper. U with where denotes cardinality of. For empty set φ obviously 0 α ( ) 1 ( ) = 1 if α ( ) = 0 is rough with respect to ( is vague with respect to ). α is crisp with respect to ( is precise with respect to ) and otherwise. If 222 Copyright c 2013 SERSC

3 2.2. Variable Precision Rough Set Variable precision rough set (VPRS) extends rough set theory by the relaxation of the subset operator [8]. It was proposed to analyze and identify data patterns which represent statistical trends rather than functional. The main idea of VPRS is to allow objects to be classified with an error smaller than a certain pre-defined level. This introduced threshold relaxes the rough set notion of requiring no information outside the dataset itself. Definition 4. Let a set U as a universe and Y U where Y φ. The error e Y is defined by classification rate of relative to Y is denoted by ( ) e ( Y ) 1 = 0 Definiton 5. Let U be a finite set and a set Y U > 0. = 0. Given be a real number within and - the range 0 < The -lower approximation of denoted by ( ) upper approximation of denoted by ( ) respectively and are defined by ( ) = { x U : e( [ x] ) } ( ) = { x U : e( [ x] ) < 1 }. The set ( ) is called the positive region of. It s the set of object of U that can be classified into with error classification rate not greater than. Then we have ( ) ( ) if only if 0 < 0. 5 which means that be restricted in an interval [ 0 0.5) in order to eep the meaning of the upper and lower approximations. 3. Minimum Error Classification Clustering (MECC) Technique 3.1. The MECC Technique for Selecting Clustering Attribute In this section we will present the proposed technique which we refer to as the Minimum Error Classification (MECC). The technique based on the accuracy approximation of attributes of variable precision rough set theory by introduces the threshold that respect to the error classification. Proposition 8 proves that prove that the accuracy of approximation by introduces the threshold is more accurate for selecting clustering attribute. Definition 7. The accuracy of approximation variable precision (accuracy of variable precision roughness) of any subset U with respect to A is denoted by α ( ). It is presented as ( ) α = where denotes cardinality of. If = 0 Pawla. ( ) ( ) it is the traditional rough set model of Copyright c 2013 SERSC 223

4 Proposition 8. Let S = ( U A V f ) be an information system ( ) roughness and ( ) factor of variable precision. ( 0 < 0.5) α ( ) α ( ). Proof. ased on Definition 5 if 0. 5 then ( ) ( ). Thus for 0 < 0. 5 have 0 ( ) ( ) and 0 ( ) ( ). Consequently 0 ( ) ( ) and 0 ( ) ( ). For = 0 based on Definition 5 α ( ) = α ( ). For 0 < < 0. 5 we have ( ) ( ) and ( ) ( ). Hence ( ) ( ). ( ) ( ) α be an accuracy of α is an accuracy of variable precision roughness given the error Therefore ( ) α ( ) α. Definition 9. Let S = (U AV f ) be an information system. Suppose A V a i has - different values say y = 12 n. Let ( a i = y ) = 12 n be a subset of the objects having -different values of attribute a i. The error classification rate of ( a i = y ) relative to ( a j = y )where i j can be defined as follows e ( ( a = y ) ( a = y ) i j = 1 ( a = y ) ( a = y ) i ( a = y ) i j a i ( ) The problem is the choice of the threshold so that accuracy approximation is higher where error classification is as least possible. ased on proposition 8 there are three cases of. Case 1. If 0.5 out. So ( ) ( ) then the meaning of the upper and lower approximations will be Case 2. If = 0 So ( ) α ( ) α = then the accuracy is not increase. Case 3. If 0 < < 0. 5 α α then the accuracy will be better than traditional rough set. Hence ( ) ( ) From the three cases above threshold can be taen as positive number that less than 0.5. Then from definition 9 the threshold > 0 can be chosen as the minimum of error classification as follows = arg min[ mean { e( ( a = y ) ( a = y )} ]. The attribute with minimum > 0 is selected as the clustering decision. i j. we 224 Copyright c 2013 SERSC

5 Algorithm: MECC Input: Data set without clustering attribute Output: Clustering attribute egin Step 1. Compute the equivalence classes using the indiscernibility relation on each attribute. Step 2. Determine the error clasification of attribute respect to all a where i j. j a i with Step 3. Select the mean error classification from step 2 be a. Step 6. Select a clustering attribute based on the minimum of. End Figure 1. The Pseudo-code of MECC for Selecting a Clustering Attribute 3.2. Example The following table is a student information system containing 15 students with 5 categorical-valued attributes; Programming Mathematics Statistics English and French. There is no a pre-defined a clustering (decision) attribute. Then we will select a clustering attribute among all candidates. Table 1. A Student Information System Prog Math Stat Eng French 1 bad Low no fluent Poor 2 bad intermediate yes poor Fluent 3 bad intermediate yes fluent Fluent 4 bad Low yes fluent Fluent 5 bad advance no fluent Poor 6 medium Low yes poor Poor 7 medium intermediate yes fluent Poor 8 medium advance no poor Poor 9 medium intermediate no fluent Poor 10 good Low yes poor Fluent 11 good advance no poor Fluent 12 good Low yes fluent Poor 13 good advance yes fluent Poor 14 good Low yes fluent Poor 15 medium advance yes fluent Fluent The procedure to find MECC value is described here. To obtain the values of MECC firstly we must obtain the equivalence classes induced by indisceribility relation of singleton attribute. ( Prog = bad) = { } ( Prog = medium) = { } ( Prog = good) = { } U / Prog = { } { } { } { }. Copyright c 2013 SERSC 225

6 ( Math = low) = { } ( Math = intermediate) = { 2379} ( Math = advance) = { } U / Math = { } { 2379} { }}. ( Stat = no) = { } ( Stat = yes) = { } U / Stat = { } { } }. ( Eng = fluent) = { } ( Eng = poor) = { } U / Eng = { } { } }. ( French = poor) = { } ( French = fluent) = { } U / French = { } { } ased on Definition 4 the error classification attribute Statistics with respect to Math is calculated as follow. { 1} { } 1 c( low no) = 6 { 9} { 2379} c( intermedia teno) = 4 { 5811} { } 3 c ( advance no) = 1 = 1 = 5 { } { } 5 c( low yes) = 6 { 237} { 2379} c( intermedia te yes) = 4 { 1315} { } 2 c( advance yes) = Following the same procedure the error classification on all attributes with respect each to the other are computed. These calculations are summarized in Table Copyright c 2013 SERSC

7 Table 2. The Minimum Error Classification Attribute (with respect to) Prog Math Stat Eng French The error classification Math Stat Eng French Prog Stat Eng French Prog Math Eng French Prog Math Stat French Prog Math Stat Eng mean With MECC technique From Table 2 the minimum of error classification of attributes is attribute Statistics. Thus attribute Statistics is selected as a clustering attribute Objects Splitting For objects splitting we use a divide-conquer method. For example in Table 2 we can cluster (partition) the objects based on the decision attribute selected i.e. Statistics. Notices that the partition of the set of animals induced by attribute Statistics is { } { } U / Stat =. To this we can split the objects using the hierarchical tree as follows The objects {158911} { } 1 st possible clusters Figure 2. The Objects Splitting The technique is applied recursively to obtain further clusters. At subsequent iterations the leaf node having more objects is selected for further splitting. The algorithm terminates when it reaches a pre-defined number of clusters. This is subjective and is pre-decided based either on user requirement or domain nowledge. 4. Experimental Results 4.1. Selecting the Clustering Attribute We elaborate the proposed technique through the three UCI benchmar datasets taen from: uci.edu [12-14]. alloon dataset contains 16 instances and 4 categorical attributes; Color Size Act and Age. Tic-Tac-Toe Endgame dataset The data contains 958 of instances and 9 categorical-attributes; top left square (TLS) top middle square (TMS) top Copyright c 2013 SERSC 227

8 right square (TRS) middle left square (MLS) middle middle square (MMS) middle right square(mrs) bottom left square (LS) bottom middle square (MS) bottom right square (RS)and a class attribute. Hayes-Roth dataset contains 132 training instances 28 test instances and 4 attributes; hobby age educational level and marital status. The algorithms of TR MMR and MECC are implemented in MATLA version (R2008a). They are executed sequentially on a processor Intel Core 2 Duo CPUs. The total main memory is 2G and the operating system is Windows 7. The experiment results are summarized in Table 3. Table 3. The Experiment Results Technique Data Set allon Tic tac toe Hayes-Roth TR Attribute Selected All All All MMR Attribute Selected All All All MECC Attribute Selected 3 dan The TR MMR and MEEC use different techniques in selecting clustering attribute. TR uses the total average of mean roughness MMR uses the minimum of mean roughness and MECC uses the error of classification quality of Variable Precision Rough Set to select a clustering attribute. ased on Table 3 the decision cannot be obtained using TR and MMR because the value of TR and MMR of attributes in all datasets are same (for TR is 0 and for MMR is 1 respectively). ut the clustering attribute can be selected based on the minimum values using MECC. The results of accuracy of the three datasets are given in Figure MECC TR MMR allon Hayes-Roth Tic tac toe Figure 3. The Accuracy of TR MMR and MECC Techniques 4.2. Clustering Objects and Validity In this sub-section we present the result of object partitioning. The purity of clusters was used as a measure to test the quality of the clusters. The purity of a cluster and overall purity are defined as 228 Copyright c 2013 SERSC

9 The number of data inboththeith cluster and itscorresponding class Purity ( i) = The number of datainthedata set Overal Purity = # of cluster i =1 Putiy( i) # of cluster alloon Dataset ased on Table 3 the selected attribute is Act and Age with the MECC value of both attributes is the same i.e For attribute Act we have the following clusters purity. Cluster Number Class 1 Class 2 Purity Overall Purity 0.83 For attribute Age we have the following clusters purity. Tic-Tac-Toe Endgame Dataset Cluster Number Class 1 Class 2 Purity Overall Purity 0.83 ased on Table 3 the selected attribute is MMS with the value of MECC For attribute MMS we have the following clusters purity Cluster Number Class 1 Class 2 Purity Overall Purity 0.69 Hayes-Roth dataset ased on Table 3 the selected attribute is F3 with the value of MECC For attribute F3 we have the following clusters purity. Cluster Number Class 1 Class 2 Class 3 Purity Overall Purity 0.63 Copyright c 2013 SERSC 229

10 4.3. Accuracy and Responses Time The benchmar data from UCI Machine Learning Repository ( i.e. Acute Inflammations alance Scale Car Evaluation Chess Flag Lenses Lung Cancer MONK's Problems Mushroom Soybean Statlog (Landsat Satellite) and Zoo [15] are used in order to test MECC and compare with MMR and TR in term of response time and accuracy. The datasets are described as follow. Table 4. The enchmar Datasets No Data Sets Number of Instances 1 Lenses alance Scale Car Evaluation MONK's Problems Acute Inflammations Zoo Soybean Chess Lung Cancer Mushroom Flag Number of Attribute The accuracy of selecting a clustering attribute is refers to Definition 7 and the results are given in Figure 4. Meanwhile the results of executing time through all dataset are given in Figures Lenses alance Scale Car Evaluation Mons Acute Inflammations Zoo Soybean Chess Lung Cancer Mushroom Flag TR MECC MMR Figure 4. The Accuracy of TR MMR and MECC Techniques 230 Copyright c 2013 SERSC

11 MECC MMR TR Zoo alance Scale Car Chess Lenses Flag Figure 5. The Responses Time of TR MMR and MECC Techniques MECC MMR TR Mushroom Mons Soybean Lung CancerAcute Inflammations Figure 6. The Responses Time of TR MMR and MECC Techniques With reference to Figure 4 it illustrates the accuracy of selecting clustering attribute. The accuracy of selecting clustering attribute of TR MMR and MECC techniques in almost case is the same. However the MECC technique has lower executing time due to less computation required as shown in Figures 5 and Figure 6. For example for Lung Cancer dataset the executing time for MECC is seconds while the executing times for TR and MMR are and seconds respectively. Thus in this case MECC improve the executing time of TR and MMR up to % in average. 5. Conclusion In this paper we have proposed an alternative technique for categorical data clustering using error classification in VPRS model. We have shown that the proposed technique able for handling noisy data. We present an example how our technique able to handle noisy data. Further we compare our technique on benchmar datasets taen from UCI ML repository. Copyright c 2013 SERSC 231

12 The results show that our technique provides better performance in selecting the clustering attribute. Since TR and MMR are based on the traditional definition of rough set theory thus our technique is different from TR and MMR. Acnowledgements This wor was supported by the Grant No. Vote PM-67/LPP-UAD/III/2012 Ahmad Dahlan University Indonesia. References [1] T. Li A General Model for Clustering inary Data Proceedings of the ACM Conference on nowledge Discovery and Data Mining (KDD 05) Chicago Illinois USA (2005) August [2] Z. Huang Extensions to the -means algorithm for clustering large data sets with categorical values Data Mining and Knowledge Discovery vol. 2 pp. 3 (1998) pp [3] Z. Pawla Rough sets International Journal of Computer and Information Science vol. 11 (1982) pp [4] Z. Pawla Rough sets: A theoretical aspect of reasoning about data Kluwer Academic Publisher (1991). [5] Z. Pawla and A. Sowron Rudiments of rough sets Information Sciences vol. 177 no. 1 pp. 3-27(2007). [6] L. J. Mazlac A. He Y. Zhu and S. Coppoc A rough set approach in choosing partitioning attributes Proceedings of the ISCA 13th International Conference CAINE-2000 Honolulu Hawaii USA (2000) November 1-3. [7] D. Parmar T. Wu and J. lachurst MMR: An algorithm for clustering categorical data using rough set theory Data and Knowledge Engineering vol. 63 (2007) pp [8] W. Ziaro Variable precision rough set model Journal of computer and system science vol. 46 (1991) pp [9] D. Sleza and W. Ziaro The investigation of the ayesian rough set model Int. J. Approx. Reason (2005). [10] G. ie ab J. Zhang b K. K. Lai c and Lean Yu d Variable precision rough set for group decision-maing: An application International Journal of Approximate Reasoning (2008). [11] I. T. R. Yanto P. Vitasari T. Herawan and M. Mat Deris Applying Variable Precision Rough Set Model for Clustering Student Suffering Study's Anxiety Expert System with Applications Elsevier (2012). [12] [13] Endgame. [14] [15] Author Iwan Tri Riyadi Yanto received his Sc degree in Mathematics from Universitas Ahmad DahlanYogyaarta Indonesia. He obtained his MIT from Universiti Tun Hussein Onn Malaysia. Currently he is a lecturer at Department of Mathematics Faculty of Mathematics and Natural Sciences Universitas Ahmad Dahlan (UAD). He published more than 15 research papers in journals and conferences. His research area includes numerical optimization data mining and KDD. 232 Copyright c 2013 SERSC

A novel k-nn approach for data with uncertain attribute values

A novel k-nn approach for data with uncertain attribute values A novel -NN approach for data with uncertain attribute values Asma Trabelsi 1,2, Zied Elouedi 1, and Eric Lefevre 2 1 Université de Tunis, Institut Supérieur de Gestion de Tunis, LARODEC, Tunisia trabelsyasma@gmail.com,zied.elouedi@gmx.fr

More information

Lecture 7 Decision Tree Classifier

Lecture 7 Decision Tree Classifier Machine Learning Dr.Ammar Mohammed Lecture 7 Decision Tree Classifier Decision Tree A decision tree is a simple classifier in the form of a hierarchical tree structure, which performs supervised classification

More information

Minimal Attribute Space Bias for Attribute Reduction

Minimal Attribute Space Bias for Attribute Reduction Minimal Attribute Space Bias for Attribute Reduction Fan Min, Xianghui Du, Hang Qiu, and Qihe Liu School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu

More information

CS145: INTRODUCTION TO DATA MINING

CS145: INTRODUCTION TO DATA MINING CS145: INTRODUCTION TO DATA MINING 4: Vector Data: Decision Tree Instructor: Yizhou Sun yzsun@cs.ucla.edu October 10, 2017 Methods to Learn Vector Data Set Data Sequence Data Text Data Classification Clustering

More information

VPRSM BASED DECISION TREE CLASSIFIER

VPRSM BASED DECISION TREE CLASSIFIER Computing and Informatics, Vol. 26, 2007, 663 677 VPRSM BASED DECISION TREE CLASSIFIER Jin-Mao Wei, Ming-Yang Wang, Jun-Ping You Institute of Computational Intelligence Key Laboratory for Applied Statistics

More information

Learning Decision Trees

Learning Decision Trees Learning Decision Trees Machine Learning Spring 2018 1 This lecture: Learning Decision Trees 1. Representation: What are decision trees? 2. Algorithm: Learning decision trees The ID3 algorithm: A greedy

More information

Decision Tree Analysis for Classification Problems. Entscheidungsunterstützungssysteme SS 18

Decision Tree Analysis for Classification Problems. Entscheidungsunterstützungssysteme SS 18 Decision Tree Analysis for Classification Problems Entscheidungsunterstützungssysteme SS 18 Supervised segmentation An intuitive way of thinking about extracting patterns from data in a supervised manner

More information

Decision Trees. Each internal node : an attribute Branch: Outcome of the test Leaf node or terminal node: class label.

Decision Trees. Each internal node : an attribute Branch: Outcome of the test Leaf node or terminal node: class label. Decision Trees Supervised approach Used for Classification (Categorical values) or regression (continuous values). The learning of decision trees is from class-labeled training tuples. Flowchart like structure.

More information

Learning Decision Trees

Learning Decision Trees Learning Decision Trees Machine Learning Fall 2018 Some slides from Tom Mitchell, Dan Roth and others 1 Key issues in machine learning Modeling How to formulate your problem as a machine learning problem?

More information

Lecture 3: Decision Trees

Lecture 3: Decision Trees Lecture 3: Decision Trees Cognitive Systems - Machine Learning Part I: Basic Approaches of Concept Learning ID3, Information Gain, Overfitting, Pruning last change November 26, 2014 Ute Schmid (CogSys,

More information

Decision T ree Tree Algorithm Week 4 1

Decision T ree Tree Algorithm Week 4 1 Decision Tree Algorithm Week 4 1 Team Homework Assignment #5 Read pp. 105 117 of the text book. Do Examples 3.1, 3.2, 3.3 and Exercise 3.4 (a). Prepare for the results of the homework assignment. Due date

More information

A new Approach to Drawing Conclusions from Data A Rough Set Perspective

A new Approach to Drawing Conclusions from Data A Rough Set Perspective Motto: Let the data speak for themselves R.A. Fisher A new Approach to Drawing Conclusions from Data A Rough et Perspective Zdzisław Pawlak Institute for Theoretical and Applied Informatics Polish Academy

More information

Classification Using Decision Trees

Classification Using Decision Trees Classification Using Decision Trees 1. Introduction Data mining term is mainly used for the specific set of six activities namely Classification, Estimation, Prediction, Affinity grouping or Association

More information

Classification Based on Logical Concept Analysis

Classification Based on Logical Concept Analysis Classification Based on Logical Concept Analysis Yan Zhao and Yiyu Yao Department of Computer Science, University of Regina, Regina, Saskatchewan, Canada S4S 0A2 E-mail: {yanzhao, yyao}@cs.uregina.ca Abstract.

More information

Principles of Pattern Recognition. C. A. Murthy Machine Intelligence Unit Indian Statistical Institute Kolkata

Principles of Pattern Recognition. C. A. Murthy Machine Intelligence Unit Indian Statistical Institute Kolkata Principles of Pattern Recognition C. A. Murthy Machine Intelligence Unit Indian Statistical Institute Kolkata e-mail: murthy@isical.ac.in Pattern Recognition Measurement Space > Feature Space >Decision

More information

Predictive Modeling: Classification. KSE 521 Topic 6 Mun Yi

Predictive Modeling: Classification. KSE 521 Topic 6 Mun Yi Predictive Modeling: Classification Topic 6 Mun Yi Agenda Models and Induction Entropy and Information Gain Tree-Based Classifier Probability Estimation 2 Introduction Key concept of BI: Predictive modeling

More information

Parts 3-6 are EXAMPLES for cse634

Parts 3-6 are EXAMPLES for cse634 1 Parts 3-6 are EXAMPLES for cse634 FINAL TEST CSE 352 ARTIFICIAL INTELLIGENCE Fall 2008 There are 6 pages in this exam. Please make sure you have all of them INTRODUCTION Philosophical AI Questions Q1.

More information

Decision Trees: Overfitting

Decision Trees: Overfitting Decision Trees: Overfitting Emily Fox University of Washington January 30, 2017 Decision tree recap Loan status: Root 22 18 poor 4 14 Credit? Income? excellent 9 0 3 years 0 4 Fair 9 4 Term? 5 years 9

More information

Classification and Prediction

Classification and Prediction Classification Classification and Prediction Classification: predict categorical class labels Build a model for a set of classes/concepts Classify loan applications (approve/decline) Prediction: model

More information

Decision Tree And Random Forest

Decision Tree And Random Forest Decision Tree And Random Forest Dr. Ammar Mohammed Associate Professor of Computer Science ISSR, Cairo University PhD of CS ( Uni. Koblenz-Landau, Germany) Spring 2019 Contact: mailto: Ammar@cu.edu.eg

More information

Decision Support. Dr. Johan Hagelbäck.

Decision Support. Dr. Johan Hagelbäck. Decision Support Dr. Johan Hagelbäck johan.hagelback@lnu.se http://aiguy.org Decision Support One of the earliest AI problems was decision support The first solution to this problem was expert systems

More information

Naive Bayesian Rough Sets

Naive Bayesian Rough Sets Naive Bayesian Rough Sets Yiyu Yao and Bing Zhou Department of Computer Science, University of Regina Regina, Saskatchewan, Canada S4S 0A2 {yyao,zhou200b}@cs.uregina.ca Abstract. A naive Bayesian classifier

More information

Analysis of Classification in Interval-Valued Information Systems

Analysis of Classification in Interval-Valued Information Systems Analysis of Classification in Interval-Valued Information Systems AMAURY CABALLERO, KANG YEN, EDUARDO CABALLERO Department of Electrical & Computer Engineering Florida International University 10555 W.

More information

Solving Classification Problems By Knowledge Sets

Solving Classification Problems By Knowledge Sets Solving Classification Problems By Knowledge Sets Marcin Orchel a, a Department of Computer Science, AGH University of Science and Technology, Al. A. Mickiewicza 30, 30-059 Kraków, Poland Abstract We propose

More information

CS6375: Machine Learning Gautam Kunapuli. Decision Trees

CS6375: Machine Learning Gautam Kunapuli. Decision Trees Gautam Kunapuli Example: Restaurant Recommendation Example: Develop a model to recommend restaurants to users depending on their past dining experiences. Here, the features are cost (x ) and the user s

More information

Data Mining Classification: Basic Concepts and Techniques. Lecture Notes for Chapter 3. Introduction to Data Mining, 2nd Edition

Data Mining Classification: Basic Concepts and Techniques. Lecture Notes for Chapter 3. Introduction to Data Mining, 2nd Edition Data Mining Classification: Basic Concepts and Techniques Lecture Notes for Chapter 3 by Tan, Steinbach, Karpatne, Kumar 1 Classification: Definition Given a collection of records (training set ) Each

More information

Detecting Anomalous and Exceptional Behaviour on Credit Data by means of Association Rules. M. Delgado, M.D. Ruiz, M.J. Martin-Bautista, D.

Detecting Anomalous and Exceptional Behaviour on Credit Data by means of Association Rules. M. Delgado, M.D. Ruiz, M.J. Martin-Bautista, D. Detecting Anomalous and Exceptional Behaviour on Credit Data by means of Association Rules M. Delgado, M.D. Ruiz, M.J. Martin-Bautista, D. Sánchez 18th September 2013 Detecting Anom and Exc Behaviour on

More information

Rough Set Model Selection for Practical Decision Making

Rough Set Model Selection for Practical Decision Making Rough Set Model Selection for Practical Decision Making Joseph P. Herbert JingTao Yao Department of Computer Science University of Regina Regina, Saskatchewan, Canada, S4S 0A2 {herbertj, jtyao}@cs.uregina.ca

More information

An Approach to Classification Based on Fuzzy Association Rules

An Approach to Classification Based on Fuzzy Association Rules An Approach to Classification Based on Fuzzy Association Rules Zuoliang Chen, Guoqing Chen School of Economics and Management, Tsinghua University, Beijing 100084, P. R. China Abstract Classification based

More information

Smart Home Health Analytics Information Systems University of Maryland Baltimore County

Smart Home Health Analytics Information Systems University of Maryland Baltimore County Smart Home Health Analytics Information Systems University of Maryland Baltimore County 1 IEEE Expert, October 1996 2 Given sample S from all possible examples D Learner L learns hypothesis h based on

More information

LBR-Meta: An Efficient Algorithm for Lazy Bayesian Rules

LBR-Meta: An Efficient Algorithm for Lazy Bayesian Rules LBR-Meta: An Efficient Algorithm for Lazy Bayesian Rules Zhipeng Xie School of Computer Science Fudan University 220 Handan Road, Shanghai 200433, PR. China xiezp@fudan.edu.cn Abstract LBR is a highly

More information

A Rough Set Interpretation of User s Web Behavior: A Comparison with Information Theoretic Measure

A Rough Set Interpretation of User s Web Behavior: A Comparison with Information Theoretic Measure A Rough et Interpretation of User s Web Behavior: A Comparison with Information Theoretic easure George V. eghabghab Roane tate Dept of Computer cience Technology Oak Ridge, TN, 37830 gmeghab@hotmail.com

More information

CptS 570 Machine Learning School of EECS Washington State University. CptS Machine Learning 1

CptS 570 Machine Learning School of EECS Washington State University. CptS Machine Learning 1 CptS 570 Machine Learning School of EECS Washington State University CptS 570 - Machine Learning 1 IEEE Expert, October 1996 CptS 570 - Machine Learning 2 Given sample S from all possible examples D Learner

More information

Data Mining. Preamble: Control Application. Industrial Researcher s Approach. Practitioner s Approach. Example. Example. Goal: Maintain T ~Td

Data Mining. Preamble: Control Application. Industrial Researcher s Approach. Practitioner s Approach. Example. Example. Goal: Maintain T ~Td Data Mining Andrew Kusiak 2139 Seamans Center Iowa City, Iowa 52242-1527 Preamble: Control Application Goal: Maintain T ~Td Tel: 319-335 5934 Fax: 319-335 5669 andrew-kusiak@uiowa.edu http://www.icaen.uiowa.edu/~ankusiak

More information

Data Mining. CS57300 Purdue University. Bruno Ribeiro. February 8, 2018

Data Mining. CS57300 Purdue University. Bruno Ribeiro. February 8, 2018 Data Mining CS57300 Purdue University Bruno Ribeiro February 8, 2018 Decision trees Why Trees? interpretable/intuitive, popular in medical applications because they mimic the way a doctor thinks model

More information

Banacha Warszawa Poland s:

Banacha Warszawa Poland  s: Chapter 12 Rough Sets and Rough Logic: A KDD Perspective Zdzis law Pawlak 1, Lech Polkowski 2, and Andrzej Skowron 3 1 Institute of Theoretical and Applied Informatics Polish Academy of Sciences Ba ltycka

More information

CHAPTER-17. Decision Tree Induction

CHAPTER-17. Decision Tree Induction CHAPTER-17 Decision Tree Induction 17.1 Introduction 17.2 Attribute selection measure 17.3 Tree Pruning 17.4 Extracting Classification Rules from Decision Trees 17.5 Bayesian Classification 17.6 Bayes

More information

1 Introduction. Keywords: Discretization, Uncertain data

1 Introduction. Keywords: Discretization, Uncertain data A Discretization Algorithm for Uncertain Data Jiaqi Ge, Yuni Xia Department of Computer and Information Science, Indiana University Purdue University, Indianapolis, USA {jiaqge, yxia}@cs.iupui.edu Yicheng

More information

Classification and Regression Trees

Classification and Regression Trees Classification and Regression Trees Ryan P Adams So far, we have primarily examined linear classifiers and regressors, and considered several different ways to train them When we ve found the linearity

More information

Knowledge Discovery and Data Mining

Knowledge Discovery and Data Mining Knowledge Discovery and Data Mining Lecture 06 - Regression & Decision Trees Tom Kelsey School of Computer Science University of St Andrews http://tom.home.cs.st-andrews.ac.uk twk@st-andrews.ac.uk Tom

More information

From statistics to data science. BAE 815 (Fall 2017) Dr. Zifei Liu

From statistics to data science. BAE 815 (Fall 2017) Dr. Zifei Liu From statistics to data science BAE 815 (Fall 2017) Dr. Zifei Liu Zifeiliu@ksu.edu Why? How? What? How much? How many? Individual facts (quantities, characters, or symbols) The Data-Information-Knowledge-Wisdom

More information

ROUGH SETS THEORY AND DATA REDUCTION IN INFORMATION SYSTEMS AND DATA MINING

ROUGH SETS THEORY AND DATA REDUCTION IN INFORMATION SYSTEMS AND DATA MINING ROUGH SETS THEORY AND DATA REDUCTION IN INFORMATION SYSTEMS AND DATA MINING Mofreh Hogo, Miroslav Šnorek CTU in Prague, Departement Of Computer Sciences And Engineering Karlovo Náměstí 13, 121 35 Prague

More information

Decision Trees. CS57300 Data Mining Fall Instructor: Bruno Ribeiro

Decision Trees. CS57300 Data Mining Fall Instructor: Bruno Ribeiro Decision Trees CS57300 Data Mining Fall 2016 Instructor: Bruno Ribeiro Goal } Classification without Models Well, partially without a model } Today: Decision Trees 2015 Bruno Ribeiro 2 3 Why Trees? } interpretable/intuitive,

More information

Research on Complete Algorithms for Minimal Attribute Reduction

Research on Complete Algorithms for Minimal Attribute Reduction Research on Complete Algorithms for Minimal Attribute Reduction Jie Zhou, Duoqian Miao, Qinrong Feng, and Lijun Sun Department of Computer Science and Technology, Tongji University Shanghai, P.R. China,

More information

Modern Information Retrieval

Modern Information Retrieval Modern Information Retrieval Chapter 8 Text Classification Introduction A Characterization of Text Classification Unsupervised Algorithms Supervised Algorithms Feature Selection or Dimensionality Reduction

More information

An Overview of Alternative Rule Evaluation Criteria and Their Use in Separate-and-Conquer Classifiers

An Overview of Alternative Rule Evaluation Criteria and Their Use in Separate-and-Conquer Classifiers An Overview of Alternative Rule Evaluation Criteria and Their Use in Separate-and-Conquer Classifiers Fernando Berzal, Juan-Carlos Cubero, Nicolás Marín, and José-Luis Polo Department of Computer Science

More information

Data Mining. 3.6 Regression Analysis. Fall Instructor: Dr. Masoud Yaghini. Numeric Prediction

Data Mining. 3.6 Regression Analysis. Fall Instructor: Dr. Masoud Yaghini. Numeric Prediction Data Mining 3.6 Regression Analysis Fall 2008 Instructor: Dr. Masoud Yaghini Outline Introduction Straight-Line Linear Regression Multiple Linear Regression Other Regression Models References Introduction

More information

Administrative notes. Computational Thinking ct.cs.ubc.ca

Administrative notes. Computational Thinking ct.cs.ubc.ca Administrative notes Labs this week: project time. Remember, you need to pass the project in order to pass the course! (See course syllabus.) Clicker grades should be on-line now Administrative notes March

More information

ENTROPIES OF FUZZY INDISCERNIBILITY RELATION AND ITS OPERATIONS

ENTROPIES OF FUZZY INDISCERNIBILITY RELATION AND ITS OPERATIONS International Journal of Uncertainty Fuzziness and Knowledge-Based Systems World Scientific ublishing Company ENTOIES OF FUZZY INDISCENIBILITY ELATION AND ITS OEATIONS QINGUA U and DAEN YU arbin Institute

More information

Machine Learning on temporal data

Machine Learning on temporal data Machine Learning on temporal data Classification rees for ime Series Ahlame Douzal (Ahlame.Douzal@imag.fr) AMA, LIG, Université Joseph Fourier Master 2R - MOSIG (2011) Plan ime Series classification approaches

More information

On Improving the k-means Algorithm to Classify Unclassified Patterns

On Improving the k-means Algorithm to Classify Unclassified Patterns On Improving the k-means Algorithm to Classify Unclassified Patterns Mohamed M. Rizk 1, Safar Mohamed Safar Alghamdi 2 1 Mathematics & Statistics Department, Faculty of Science, Taif University, Taif,

More information

Fuzzy Rough Sets with GA-Based Attribute Division

Fuzzy Rough Sets with GA-Based Attribute Division Fuzzy Rough Sets with GA-Based Attribute Division HUGANG HAN, YOSHIO MORIOKA School of Business, Hiroshima Prefectural University 562 Nanatsuka-cho, Shobara-shi, Hiroshima 727-0023, JAPAN Abstract: Rough

More information

CS145: INTRODUCTION TO DATA MINING

CS145: INTRODUCTION TO DATA MINING CS145: INTRODUCTION TO DATA MINING 5: Vector Data: Support Vector Machine Instructor: Yizhou Sun yzsun@cs.ucla.edu October 18, 2017 Homework 1 Announcements Due end of the day of this Thursday (11:59pm)

More information

Selected Algorithms of Machine Learning from Examples

Selected Algorithms of Machine Learning from Examples Fundamenta Informaticae 18 (1993), 193 207 Selected Algorithms of Machine Learning from Examples Jerzy W. GRZYMALA-BUSSE Department of Computer Science, University of Kansas Lawrence, KS 66045, U. S. A.

More information

Lecture 3: Decision Trees

Lecture 3: Decision Trees Lecture 3: Decision Trees Cognitive Systems II - Machine Learning SS 2005 Part I: Basic Approaches of Concept Learning ID3, Information Gain, Overfitting, Pruning Lecture 3: Decision Trees p. Decision

More information

Correlation Preserving Unsupervised Discretization. Outline

Correlation Preserving Unsupervised Discretization. Outline Correlation Preserving Unsupervised Discretization Jee Vang Outline Paper References What is discretization? Motivation Principal Component Analysis (PCA) Association Mining Correlation Preserving Discretization

More information

Incremental Construction of Complex Aggregates: Counting over a Secondary Table

Incremental Construction of Complex Aggregates: Counting over a Secondary Table Incremental Construction of Complex Aggregates: Counting over a Secondary Table Clément Charnay 1, Nicolas Lachiche 1, and Agnès Braud 1 ICube, Université de Strasbourg, CNRS 300 Bd Sébastien Brant - CS

More information

Dan Roth 461C, 3401 Walnut

Dan Roth   461C, 3401 Walnut CIS 519/419 Applied Machine Learning www.seas.upenn.edu/~cis519 Dan Roth danroth@seas.upenn.edu http://www.cis.upenn.edu/~danroth/ 461C, 3401 Walnut Slides were created by Dan Roth (for CIS519/419 at Penn

More information

Inductive learning models with missing values

Inductive learning models with missing values Mathematical and Computer Modelling 44 (2006) 790 806 www.elsevier.com/locate/mcm Inductive learning models with missing values I. Fortes a,, L. Mora-López b, R. Morales b, F. Triguero b a Dept. Matemática

More information

Decision Trees Part 1. Rao Vemuri University of California, Davis

Decision Trees Part 1. Rao Vemuri University of California, Davis Decision Trees Part 1 Rao Vemuri University of California, Davis Overview What is a Decision Tree Sample Decision Trees How to Construct a Decision Tree Problems with Decision Trees Classification Vs Regression

More information

Decision Trees (Cont.)

Decision Trees (Cont.) Decision Trees (Cont.) R&N Chapter 18.2,18.3 Side example with discrete (categorical) attributes: Predicting age (3 values: less than 30, 30-45, more than 45 yrs old) from census data. Attributes (split

More information

CS 6375 Machine Learning

CS 6375 Machine Learning CS 6375 Machine Learning Decision Trees Instructor: Yang Liu 1 Supervised Classifier X 1 X 2. X M Ref class label 2 1 Three variables: Attribute 1: Hair = {blond, dark} Attribute 2: Height = {tall, short}

More information

Iterative Laplacian Score for Feature Selection

Iterative Laplacian Score for Feature Selection Iterative Laplacian Score for Feature Selection Linling Zhu, Linsong Miao, and Daoqiang Zhang College of Computer Science and echnology, Nanjing University of Aeronautics and Astronautics, Nanjing 2006,

More information

CPSC 340: Machine Learning and Data Mining. Linear Least Squares Fall 2016

CPSC 340: Machine Learning and Data Mining. Linear Least Squares Fall 2016 CPSC 340: Machine Learning and Data Mining Linear Least Squares Fall 2016 Assignment 2 is due Friday: Admin You should already be started! 1 late day to hand it in on Wednesday, 2 for Friday, 3 for next

More information

COMS 4771 Lecture Boosting 1 / 16

COMS 4771 Lecture Boosting 1 / 16 COMS 4771 Lecture 12 1. Boosting 1 / 16 Boosting What is boosting? Boosting: Using a learning algorithm that provides rough rules-of-thumb to construct a very accurate predictor. 3 / 16 What is boosting?

More information

Rule Generation using Decision Trees

Rule Generation using Decision Trees Rule Generation using Decision Trees Dr. Rajni Jain 1. Introduction A DT is a classification scheme which generates a tree and a set of rules, representing the model of different classes, from a given

More information

Chapter 2 The Naïve Bayes Model in the Context of Word Sense Disambiguation

Chapter 2 The Naïve Bayes Model in the Context of Word Sense Disambiguation Chapter 2 The Naïve Bayes Model in the Context of Word Sense Disambiguation Abstract This chapter discusses the Naïve Bayes model strictly in the context of word sense disambiguation. The theoretical model

More information

Decision Tree Learning and Inductive Inference

Decision Tree Learning and Inductive Inference Decision Tree Learning and Inductive Inference 1 Widely used method for inductive inference Inductive Inference Hypothesis: Any hypothesis found to approximate the target function well over a sufficiently

More information

A Discretization Algorithm for Uncertain Data

A Discretization Algorithm for Uncertain Data A Discretization Algorithm for Uncertain Data Jiaqi Ge 1,*, Yuni Xia 1, and Yicheng Tu 2 1 Department of Computer and Information Science, Indiana University Purdue University, Indianapolis, USA {jiaqge,yxia}@cs.iupui.edu

More information

Study on Classification Methods Based on Three Different Learning Criteria. Jae Kyu Suhr

Study on Classification Methods Based on Three Different Learning Criteria. Jae Kyu Suhr Study on Classification Methods Based on Three Different Learning Criteria Jae Kyu Suhr Contents Introduction Three learning criteria LSE, TER, AUC Methods based on three learning criteria LSE:, ELM TER:

More information

On Rough Set Modelling for Data Mining

On Rough Set Modelling for Data Mining On Rough Set Modelling for Data Mining V S Jeyalakshmi, Centre for Information Technology and Engineering, M. S. University, Abhisekapatti. Email: vsjeyalakshmi@yahoo.com G Ariprasad 2, Fatima Michael

More information

Dynamic Programming Approach for Construction of Association Rule Systems

Dynamic Programming Approach for Construction of Association Rule Systems Dynamic Programming Approach for Construction of Association Rule Systems Fawaz Alsolami 1, Talha Amin 1, Igor Chikalov 1, Mikhail Moshkov 1, and Beata Zielosko 2 1 Computer, Electrical and Mathematical

More information

Machine Learning Alternatives to Manual Knowledge Acquisition

Machine Learning Alternatives to Manual Knowledge Acquisition Machine Learning Alternatives to Manual Knowledge Acquisition Interactive programs which elicit knowledge from the expert during the course of a conversation at the terminal. Programs which learn by scanning

More information

1 [15 points] Frequent Itemsets Generation With Map-Reduce

1 [15 points] Frequent Itemsets Generation With Map-Reduce Data Mining Learning from Large Data Sets Final Exam Date: 15 August 2013 Time limit: 120 minutes Number of pages: 11 Maximum score: 100 points You can use the back of the pages if you run out of space.

More information

Removing trivial associations in association rule discovery

Removing trivial associations in association rule discovery Removing trivial associations in association rule discovery Geoffrey I. Webb and Songmao Zhang School of Computing and Mathematics, Deakin University Geelong, Victoria 3217, Australia Abstract Association

More information

Randomized Decision Trees

Randomized Decision Trees Randomized Decision Trees compiled by Alvin Wan from Professor Jitendra Malik s lecture Discrete Variables First, let us consider some terminology. We have primarily been dealing with real-valued data,

More information

Data Mining Project. C4.5 Algorithm. Saber Salah. Naji Sami Abduljalil Abdulhak

Data Mining Project. C4.5 Algorithm. Saber Salah. Naji Sami Abduljalil Abdulhak Data Mining Project C4.5 Algorithm Saber Salah Naji Sami Abduljalil Abdulhak Decembre 9, 2010 1.0 Introduction Before start talking about C4.5 algorithm let s see first what is machine learning? Human

More information

Entity Resolution with Crowd Errors

Entity Resolution with Crowd Errors Entity Resolution with Crowd Errors Vasilis Verroios Stanford University verroios@stanford.edu Hector Garcia-Molina Stanford University hector@cs.stanford.edu ABSTRACT Given a set of records, an ER algorithm

More information

Decision Tree Learning

Decision Tree Learning Decision Tree Learning Berlin Chen Department of Computer Science & Information Engineering National Taiwan Normal University References: 1. Machine Learning, Chapter 3 2. Data Mining: Concepts, Models,

More information

Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation

Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation Lecture Notes for Chapter 4 Part I Introduction to Data Mining by Tan, Steinbach, Kumar Adapted by Qiang Yang (2010) Tan,Steinbach,

More information

An introduction to clustering techniques

An introduction to clustering techniques - ABSTRACT Cluster analysis has been used in a wide variety of fields, such as marketing, social science, biology, pattern recognition etc. It is used to identify homogenous groups of cases to better understand

More information

43400 Serdang Selangor, Malaysia Serdang Selangor, Malaysia 4

43400 Serdang Selangor, Malaysia Serdang Selangor, Malaysia 4 An Extended ID3 Decision Tree Algorithm for Spatial Data Imas Sukaesih Sitanggang # 1, Razali Yaakob #2, Norwati Mustapha #3, Ahmad Ainuddin B Nuruddin *4 # Faculty of Computer Science and Information

More information

Machine Learning Recitation 8 Oct 21, Oznur Tastan

Machine Learning Recitation 8 Oct 21, Oznur Tastan Machine Learning 10601 Recitation 8 Oct 21, 2009 Oznur Tastan Outline Tree representation Brief information theory Learning decision trees Bagging Random forests Decision trees Non linear classifier Easy

More information

Learning Classification Trees. Sargur Srihari

Learning Classification Trees. Sargur Srihari Learning Classification Trees Sargur srihari@cedar.buffalo.edu 1 Topics in CART CART as an adaptive basis function model Classification and Regression Tree Basics Growing a Tree 2 A Classification Tree

More information

Advanced Statistical Methods: Beyond Linear Regression

Advanced Statistical Methods: Beyond Linear Regression Advanced Statistical Methods: Beyond Linear Regression John R. Stevens Utah State University Notes 3. Statistical Methods II Mathematics Educators Worshop 28 March 2009 1 http://www.stat.usu.edu/~jrstevens/pcmi

More information

Supervised Learning! Algorithm Implementations! Inferring Rudimentary Rules and Decision Trees!

Supervised Learning! Algorithm Implementations! Inferring Rudimentary Rules and Decision Trees! Supervised Learning! Algorithm Implementations! Inferring Rudimentary Rules and Decision Trees! Summary! Input Knowledge representation! Preparing data for learning! Input: Concept, Instances, Attributes"

More information

Belief Classification Approach based on Dynamic Core for Web Mining database

Belief Classification Approach based on Dynamic Core for Web Mining database Third international workshop on Rough Set Theory RST 11 Milano, Italy September 14 16, 2010 Belief Classification Approach based on Dynamic Core for Web Mining database Salsabil Trabelsi Zied Elouedi Larodec,

More information

Easy Categorization of Attributes in Decision Tables Based on Basic Binary Discernibility Matrix

Easy Categorization of Attributes in Decision Tables Based on Basic Binary Discernibility Matrix Easy Categorization of Attributes in Decision Tables Based on Basic Binary Discernibility Matrix Manuel S. Lazo-Cortés 1, José Francisco Martínez-Trinidad 1, Jesús Ariel Carrasco-Ochoa 1, and Guillermo

More information

Learning Decision Trees

Learning Decision Trees Learning Decision Trees CS194-10 Fall 2011 Lecture 8 CS194-10 Fall 2011 Lecture 8 1 Outline Decision tree models Tree construction Tree pruning Continuous input features CS194-10 Fall 2011 Lecture 8 2

More information

Quantization of Rough Set Based Attribute Reduction

Quantization of Rough Set Based Attribute Reduction A Journal of Software Engineering and Applications, 0, 5, 7 doi:46/sea05b0 Published Online Decemer 0 (http://wwwscirporg/ournal/sea) Quantization of Rough Set Based Reduction Bing Li *, Peng Tang, Tommy

More information

Lecture VII: Classification I. Dr. Ouiem Bchir

Lecture VII: Classification I. Dr. Ouiem Bchir Lecture VII: Classification I Dr. Ouiem Bchir 1 Classification: Definition Given a collection of records (training set ) Each record contains a set of attributes, one of the attributes is the class. Find

More information

Complete Bi-Decomposition of Multiple-Valued Functions Using MIN and MAX Gates

Complete Bi-Decomposition of Multiple-Valued Functions Using MIN and MAX Gates Complete i-decomposition of Multiple-Valued Functions Using MIN and MX Gates ernd Steinbach TU ergakademie Freiberg steinb@informatik.tu-freiberg.de Christian Lang IMMS ggmbh Erfurt christian.lang@imms.de

More information

Data Mining and Analysis: Fundamental Concepts and Algorithms

Data Mining and Analysis: Fundamental Concepts and Algorithms Data Mining and Analysis: Fundamental Concepts and Algorithms dataminingbook.info Mohammed J. Zaki 1 Wagner Meira Jr. 2 1 Department of Computer Science Rensselaer Polytechnic Institute, Troy, NY, USA

More information

Support Vector Machine. Industrial AI Lab.

Support Vector Machine. Industrial AI Lab. Support Vector Machine Industrial AI Lab. Classification (Linear) Autonomously figure out which category (or class) an unknown item should be categorized into Number of categories / classes Binary: 2 different

More information

Selection of Classifiers based on Multiple Classifier Behaviour

Selection of Classifiers based on Multiple Classifier Behaviour Selection of Classifiers based on Multiple Classifier Behaviour Giorgio Giacinto, Fabio Roli, and Giorgio Fumera Dept. of Electrical and Electronic Eng. - University of Cagliari Piazza d Armi, 09123 Cagliari,

More information

Final Exam, Machine Learning, Spring 2009

Final Exam, Machine Learning, Spring 2009 Name: Andrew ID: Final Exam, 10701 Machine Learning, Spring 2009 - The exam is open-book, open-notes, no electronics other than calculators. - The maximum possible score on this exam is 100. You have 3

More information

Aijun An and Nick Cercone. Department of Computer Science, University of Waterloo. methods in a context of learning classication rules.

Aijun An and Nick Cercone. Department of Computer Science, University of Waterloo. methods in a context of learning classication rules. Discretization of Continuous Attributes for Learning Classication Rules Aijun An and Nick Cercone Department of Computer Science, University of Waterloo Waterloo, Ontario N2L 3G1 Canada Abstract. We present

More information

Fuzzy Local Trend Transform based Fuzzy Time Series Forecasting Model

Fuzzy Local Trend Transform based Fuzzy Time Series Forecasting Model Int. J. of Computers, Communications & Control, ISSN 1841-9836, E-ISSN 1841-9844 Vol. VI (2011), No. 4 (December), pp. 603-614 Fuzzy Local Trend Transform based Fuzzy Time Series Forecasting Model J. Dan,

More information

Machine Learning in Action

Machine Learning in Action Machine Learning in Action Tatyana Goldberg (goldberg@rostlab.org) August 16, 2016 @ Machine Learning in Biology Beijing Genomics Institute in Shenzhen, China June 2014 GenBank 1 173,353,076 DNA sequences

More information

On Multi-Class Cost-Sensitive Learning

On Multi-Class Cost-Sensitive Learning On Multi-Class Cost-Sensitive Learning Zhi-Hua Zhou and Xu-Ying Liu National Laboratory for Novel Software Technology Nanjing University, Nanjing 210093, China {zhouzh, liuxy}@lamda.nju.edu.cn Abstract

More information