An asymmetric entropy measure for decision trees
|
|
- Shavonne Sullivan
- 5 years ago
- Views:
Transcription
1 An asymmetric entropy measure for decision trees Simon Marcellin Laboratoire ERIC Université Lumière Lyon 2 5 av. Pierre Mendès-France BRON Cedex France simon.marcellin@univ-lyon2.fr Djamel A. Zighed Laboratoire ERIC Université Lumière Lyon 2 5 av. Pierre Mendès-France BRON Cedex France zighed@univ-lyon2.fr Gilbert Ritschard THEMES, Department of Econometrics Uni-Mail, 4, bd du Pont-d Arve, room 5232 CH-2 Geneva 4 Switzerland Gilbert.Ritschard@themes.unige.ch Abstract In this paper we present a new entropy measure to grow decision trees. This measure has the particularity to be asymmetric, allowing the user to grow trees which better correspond to his expectations in terms of recall and precision on each class. Then we propose decision rules adapted to such trees. Experiments have been led on real medical data from breast cancer screening units. Keywords: decision trees, entropy, class imbalance, asymmetric learning, decision rules. Introduction Classical decision trees verify two properties linked one to each other: on the one hand, the symmetry of the splitting criterion (Shannon entropy for ID3 [2] or Gini measure for CART [4]), meaning that maximal uncertainty is reached for equiprobability situations; on the other hand, the application of the majority rule to affect a class to a leaf. But, in many problems, distributions are not balanced and the different classes have not the same importance [8] for the user. This happens for example in marketing [5], medical computer-aided diagnosis or fraud detection [], where some classes are much more important than others. Thus users requirements towards a classification model are different. To illustrate this problem one should consider the example of computer-aided diagnosis of breast cancer. The aim is to label regions as cancer or non-cancer on digitized films. In this domain the cancer class is less represented in the datasets, but it is imperative to class all them well. A standard decision tree considers that the maximum uncertainty is reached in leaves having 5% of cancer and 5% of non-cancer. Likewise, the leaves are classified as cancer as soon as cancer proportion exceeds 5% and noncancer otherwise. We may see in this example that this is not suitable to this kind of problems. Indeed as non-cancer decision may have serious sequel, we could decide to classify in this category only the leaves with a non-cancer proportion exceeding 9%. On the other hand, it is so important to find all cancers that we could decide to classify leaves as cancer from a proportion of 2%. Some approaches have permitted to deal with this problem. We may class them into four categories []. First, the cost-sensitive approaches allow penalizing some types of errors, balancing the number of examples of the concerned class in the confusion matrix. Then the sampling methods allow to over-represent the minority class or under-represent the majority class [, 4]. Some wrapper methods as MetaCost, that produce several instances of a classifier by bootstrap, re-label each example by votes and build another model using the new labels, have also been proposed [7]. At last, other methods closer to our try to include a bias directly in the classifier, in particular by using a cost function instead of an entropy criterion [9, 6]. Concerning more precisely de-
2 Uncertainty. Shannon Quadratic. Probability of class Figure : Examples of standard entropy measures for a two-class problem cision trees, it is also possible to modify the decision rules of the leaves. This does not change the produced tree but just the decision for each leaf. In order to have an entirely coherent model, we propose to adapt the splitting criterion to these rules, so that the maximum uncertainty situation corresponds to the indecision situation in the decision rules. For example, in our two-class problem, if we classify a leaf as cancer when the frequency of cancers exceeds 2%, and as non-cancer when their frequency exceeds 9%, the maximal uncertainty must be reached when the proportion of cancer class is between % and 2%. We will see in section 2 how we have determined an asymmetric uncertainty measure. Section 3 presents our asymmetric entropy. In section 4 we present how to adapt the decision rules to this asymmetric entropy measure. Section 5 details the results achieved on a real medical dataset within the framework of a computer-aided diagnosis of breast cancer system, and two standard datasets from the UCI repository [9]. Finally, section 6 concludes and proposes extensions to our work. 2 Asymmetric uncertainty measure The different splitting criteria that have been proposed are symmetric, meaning that maximal uncertainty is reached in equiprobability situations [5]. Yet we need the maximal uncertainty to be reached for a given probability, noted w. This criterion should verify the properties of the entropy, especially the concavity. More formally, we search for a function h of p (frequency of the considered class in a leaf) with a parameter w (value for p where the uncertainty is maximal) with p [, ] and w [, ]. This function should respect the entropy properties, except that the maximum should be reached in w instead of. So the constraints are [5]: Minimality: Strict concavity: Maximality: h w (p = ) = () h w (p = ) = (2) d 2 h w (p) dp 2 < (3) dh w (p = w) dp = (4) We assume that it exists a rational function verifying those conditions: h w (p) = ap2 + bp + c dp + e To remove a degree of freedom we may add a constraint on the maximum value: h w (p = w) = (5) Those constraints allow us to get the following function: h w (p) = p 2 + p ( 2w + )p + w 2 = p( p) ( 2w + )p + w 2 With this function we may represent the contribution of a class to the global uncertainty of a node. It verifies the hypotheses () to (5).
3 Class Class 2 Normalized sum Uncertainty Uncertainty... p. Probability of class Figure 2: Asymmetric uncertainty measure for w= Figure 3: Two-class asymmetric entropy, W = [, ] 3 Asymmetric entropy So we have determined an uncertainty function. To obtain the asymmetric entropy of a node in a k-class problem, we propose to use a sum: k H W (P) = h wi (p i ) i= with P = [p,..., p k ], W = [w,..., w k ] Then to normalize it between and. This function has the following properties: Minimality: H W (P) = if i p i = : the leaf is pure. Strict concavity: As the sum of concave functions is concave, H W (P) is strictly concave. Maximality: the unique analytic solution P = [p,..., p k ] of H W (P) = i dh w i (p i ) = i p i dp i exists. We just give here an approximation of its position. The maximum of the asymmetric entropy function is located in the interval defined as follows: p i > w i i if k i= w i < p i < w i i if k i= w i > p i = w i i if k i= w i = Figure 4 shows the value of the entropy for a three-class problem, in function of p and p 2 (the third class probability is implicit because p 3 = p p 2 ). We can now grow a decision tree taking into account users requirements in terms of recall and precision for each class. At each new split in the decision tree, we try to get rid of the uncertainty situations represented by our asymmetric entropy measure. The aim of the asymmetric entropy is to produce a tree tailored the best as possible to the decision rules provided by the user. To be coherent, the parameters should be set in order to reflect those rules. We will see in the next section the way those rules are managed. 4 Impact on decision rules How may we classify a leaf having the distribution P = [p,..., p k ]? Standard trees set the class C with the majority rule: (R) C = i if p i > p j j i According to the construction of our entropy measure, we propose the rule: (R) C = i if p i > w i
4 Uncertainty Ø C=.. Probability of class Figure 4: Three-class asymmetric entropy, W = [.,, ] Figure 5: Decision rules for a two-class problem, W = [, ] Nevertheless, this rule is not sufficient: conflicts may occur if more than one class satisfy the rule R at the same time (see figure 6) Moreover, if k i= w i >, we will confront situations where the rule R is not satisfied for any class (see figure 5). If R is satisfied for several classes, we must determine the winning class. We may propose different rules, according to the considered problem: set the leaf to the class i having the smallest constraint (min(w) = w i, to privilege recall rate on minority class), or to the class having the largest p i /w i rate. Nevertheless, it seems important to us that the rules and the entropy should be coherent, i.e. the frontier point between the choice of one class or another corresponds to a maximal entropy. So we propose the following method. A leaf is litigious if more than one class satisfy the rule R: p > w. Those classes form the ensemble of candidates: S. To decide, we compare all the pairs of distinct classes a and b of S. For each pair, we will determine the winning class. As we will see, the choice is based on the entropy. Its concavity property implies that except in indeterminate cases, one class dominate all others. Probability of class 2. or 3 C=3 C=, 2 or 3 C= or 2 C= or 3. Probability of class Figure 6: Decision rules for a three-class problem, W = [.,, ] C=
5 We search the distribution between p a and p b which provides us a maximal entropy, setting all other probabilities to their observed value p i = α i : H = h wi (α i ) h wa (p a ) h wb (p b ) h wk (α k ) If we note δ = p i i a, i b, we obtain: H = h wi (α i ) h wa (p a ) h wb ( p a δ) h wk (α k ) Probability of class 2 or 3 C=3 Ø C= or 2 C= This expression is only function of p a. We search its maximum, i.e.: dh = d[h w a (p a ) + h wb ( p a δ)] = dp a dp a. C= or 3. Probability of class Figure 7: Decision rules for a three-class problem, W = [,, ] The single solution p a exists. We can prove that this solution is such that p a (w a, δ w b ) (see the maximality property of the asymmetric entropy). Then we consider three situations: If p a > p a, class a win If p a < p a, class b win If p a = p a: indeterminate Unless we are in an indeterminate situation, one and just one class dominates all others within the ensemble S. This class is assigned to the leaf. Figures 7 and 8 illustrate this rule. In figure 8, to decide in the areas where R is satisfied for several classes we consider the indecision curves given by the crests of the entropy as defined above. The choice is made in function of the side of the crest where the considered distribution is located. So, using the asymmetric entropy could produce leaves for which we can t decide, and thus let some examples unclassified. This asymmetric entropy may be compared to cost measures ([]) but our measure respects the expected properties of an entropy function. Moreover the parameters are set intelligibly. Probability of class 2 or 3. C=3 Ø C= or 2 C= or 3. Probability of class Figure 8: Indecision curves ( crests ) of entropy for a three-class problem, W = [,, ] C=
6 5 Experiments We have experimented our method on real data from breast cancer screening units. The aim is to detect tumours on digitized mammograms. Several hundreds of films have been annotated by radiologists. Then, those films have been segmented with imaging methods, in order to obtain regions of interest. Those zones have been labelled as cancer if they correspond to a radiologist annotation or non-cancer otherwise. At last, a great number of features (based on grey-level histogram, shape and texture) have been computed. The aim is to build a model able to separate cancers and non-cancers. (See the description of the mammo dataset table ). Table : Description of the mammo dataset Dataset # of # of % of features examples cancer Mammo % To allow comparisons with other approaches, we have also tested our method on two standard machine learning datasets [9, 6]. On these imbalanced datasets we tried to obtain the best recall on the minority class, keeping a correct precision on this class. Table 2: Description of the UCI dataset Dataset # of # of % of feat. ex. min. class Hypothyroid % Satimage % With each dataset we have built three series of classifiers using folds cross-validation: standard C4.5 [3] and RandomForest [3, 2]; C4.5 and RandomForest using cost matrix with the WEKA software [4]; and tree and RandomForest with the asymmetric entropy criterion we proposed (on each dataset, different Random Forests are built with the same number of trees and features). We also compare our results with those obtained by Chen and Liaw [6]. For each test we present recall and precision rates on the minority class. As we present only two-class problems, the results on the majority class are implicit. So we do not give them here. Table 3: Parameters for mammo Method Parameters C4.5 - Random Forest trees, 28 features C4.5 with cost () cost : C4.5 with cost (2) cost :2 RF with cost cost :35 Asymmetric tree w =, w 2 = Asymmetric RF w =, w 2 =. Table 4: Results for mammo Method Recall Precision C4.5 3 Random Forest.. C4.5 with cost ().8 C4.5 with cost (2). 5 RF with cost 7.4 Asymmetric tree. Asymmetric RF 8. For asymmetric methods and cost matrix we used several sets of parameters. Tables 4, 5 and 6 present the best results for each method. We may see that the best results are always obtained with an asymmetric Random Forest. For the real-data mammo dataset, asymmetric entropy with a single decision tree does not improve results (cost-sensitive C4.5 is better), but causes real enhancement with the Random Forest. Table 5: Parameters for hypothyroid Method Parameters C4.5 - Random Forest 3 trees, 4 features C4.5 with cost cost :5 RF with cost cost :5 Asymmetric tree w =, w 2 =. Asymmetric RF w =, w 2 =. BRF cutoff= WRF weight=:5 For the hypothyroid dataset, we also present the results obtained with Balanced Random
7 Table 6: Results for hypothyroid Method Recall Precision C Random Forest 6 6 C4.5 with cost 8 6 RF with cost 9 6 Asymmetric tree 8 7 Asymmetric RF. 8 BRF 5 3 WRF 3 3 Forest and Weighted Random Forest [6]. On this dataset, asymmetric Random Forest dominates all other methods, whatever the parameters. Table 8: Results for satimage Method Recall Precision C Random Forest 4 4 C4.5 with cost () 2 9 C4.5 with cost (2) 2 RF with cost () 5 3 RF with cost (2) 4 3 Asymmetric tree () 3 Asymmetric tree (2) 9 Asymmetric RF () 9 9 Asymmetric RF (2) 6 BRF () 7 4 BRF (2) 7 6 WRF () 9 9 WRF (2) 7 Table 7: Parameters for satimage Method Parameters C4.5 - Random Forest 3 trees, 6 features C4.5 with cost () cost :5 C4.5 with cost (2) cost :5 RF with cost () cost :2 RF with cost (2) cost :5 Asymmetric tree () w = 8, w 2 =. Asymmetric tree (2) w =, w 2 = Asymmetric RF () w = 8, w 2 =. Asymmetric RF (2) w =, w 2 = BRF () cutoff= BRF (2) cutoff= WRF () (Not specified) WRF (2) (Not specified) With the satimage dataset, we have chosen the minority class and merged the five others to create the majority class. We also present BRF and WRF results. We may observe on figure 9 that for the satimage dataset, the asymmetric Random Forest dominates other methods again. The Random Forest with cost matrix is very close to WRF and BRF. Finally we may notice that for this dataset, asymmetric tree is a bit better than C4.5 using cost matrix. Figure 9: Recall and precision for satimage with different parameters 6 Conclusion and future works We propose an asymmetric entropy measure for decision trees. By using it with adapted decision rules, we can grow trees taking into account the user s specifications in terms of recall and precision for each class, and thus obtain better results on imbalanced datasets. We may notice that this method entails no additional computing complexity, and that the parameters can be set intelligibly by the user. In the future we plan to improve our works on different points. Particularly, the entropy that we propose may be enhanced, in particular the aggregation of the uncertainty (we use a sum) for each class. Experiments with more than two class problems will be led. We
8 also will lead a theoretical comparison between this approach and the ones that try to introduce costs in the splitting criterion. Indeed, as those two approaches may be expressed in the same way, it seems to us that using costs entails ruptures in the concavity of the splitting criterion, which could produce under-optimal trees. Acknowledgements This work has been realized within a thesis cofinanced by the French Ministry of Research and Industry. We would like to thank the ARDOC breast cancer screening unit and the senology unit of Centre-République (Clermont-Ferrand, France) for their expertise and the mammography data. We also thank Pierre-Emmanuel Jouve, Julien Thomas and Jérémy Clech (Fenics company, Lyon, France) for their help and advices. References [] R. Barandela, J. S. Sanchez, et al. (23). Strategies for learning in class imbalance problems, In Pattern Recognition, volume 36(3), pages [2] L. Breiman (2). Random Forests, In Machine Learning, volume 45(), pages [3] L. Breiman (22). Looking inside the black box. In WALD lectures, the 277th meeting of the Institute of Mathematical Statistics, Banff, Alberta, Canada. [4] L. Breiman, J. H. Friedman, et al. (984). Classification and Regression Trees. Belmont, Wadsworth. [5] J.-H. Chauchat, R. Rakotomalala, et al. (2). Targeting Customer Groups using Gain and Cost Matrix: a Marketing Application. In Proceedings of the Datamining and Marketing Whorkshop, 5th European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD ), pages -4. Data. Technical Report. Berkeley, Department of Statistics, University of California. [7] P. Domingos (999). MetaCost: A general method for making classifiers costsensitive. In Proceedings of the Fifth International Conference on Knowledge Discovery and Data Mining (KDD-99), pages [8] C. Elkan (2). The Foundations of Cost-Sensitive Learning. In Proceedings of the Seventeenth International Joint Conference on Artificial Intelligence (IJ- CAI ), pages [9] S. Hettich and S. D. Bay (999). The UCI KDD Archive. Irvine, california, USA, University of California, Department of Information and Computer Science. [] C. X. Ling, Q. Yang, et al. (24). Decision trees with minimal costs. In Proceedings of the twenty-first international conference on Machine learning, 69, Banff, Alberta, Canada. [] F. Provost (2). Learning with Imbalanced Data Sets. In Invited paper for the AAAI 2 Workshop on Imbalanced Data Sets. [2] J. R. Quinlan (986). Induction of Decision Trees, In Machine Learning volume (), pages 8-6. [3] J. R. Quinlan (993). C4.5: programs for machine learning. San Francisco, Morgan Kaufmann Publishers Inc. [4] I. H. Witten and E. Frank (25). Data Mining: Practical machine learning tools and techniques. San Francisco. [5] D. Zighed and R. Rakotomalala (2). Graphe d induction Apprentissage et Data Mining. Paris, Hermès. [6] C. Chen, A. Liaw, et al. (24). Using Random Forest to Learn Imbalanced
On Multi-Class Cost-Sensitive Learning
On Multi-Class Cost-Sensitive Learning Zhi-Hua Zhou and Xu-Ying Liu National Laboratory for Novel Software Technology Nanjing University, Nanjing 210093, China {zhouzh, liuxy}@lamda.nju.edu.cn Abstract
More informationDecision Trees. CS57300 Data Mining Fall Instructor: Bruno Ribeiro
Decision Trees CS57300 Data Mining Fall 2016 Instructor: Bruno Ribeiro Goal } Classification without Models Well, partially without a model } Today: Decision Trees 2015 Bruno Ribeiro 2 3 Why Trees? } interpretable/intuitive,
More informationOptimizing Abstaining Classifiers using ROC Analysis. Tadek Pietraszek / 'tʌ dek pɪe 'trʌ ʃek / ICML 2005 August 9, 2005
IBM Zurich Research Laboratory, GSAL Optimizing Abstaining Classifiers using ROC Analysis Tadek Pietraszek / 'tʌ dek pɪe 'trʌ ʃek / pie@zurich.ibm.com ICML 2005 August 9, 2005 To classify, or not to classify:
More informationA Metric Approach to Building Decision Trees based on Goodman-Kruskal Association Index
A Metric Approach to Building Decision Trees based on Goodman-Kruskal Association Index Dan A. Simovici and Szymon Jaroszewicz University of Massachusetts at Boston, Department of Computer Science, Boston,
More informationDecision Trees. Common applications: Health diagnosis systems Bank credit analysis
Decision Trees Rodrigo Fernandes de Mello Invited Professor at Télécom ParisTech Associate Professor at Universidade de São Paulo, ICMC, Brazil http://www.icmc.usp.br/~mello mello@icmc.usp.br Decision
More informationInduction of Decision Trees
Induction of Decision Trees Peter Waiganjo Wagacha This notes are for ICS320 Foundations of Learning and Adaptive Systems Institute of Computer Science University of Nairobi PO Box 30197, 00200 Nairobi.
More informationLossless Online Bayesian Bagging
Lossless Online Bayesian Bagging Herbert K. H. Lee ISDS Duke University Box 90251 Durham, NC 27708 herbie@isds.duke.edu Merlise A. Clyde ISDS Duke University Box 90251 Durham, NC 27708 clyde@isds.duke.edu
More informationM chi h n i e n L e L arni n n i g Decision Trees Mac a h c i h n i e n e L e L a e r a ni n ng
1 Decision Trees 2 Instances Describable by Attribute-Value Pairs Target Function Is Discrete Valued Disjunctive Hypothesis May Be Required Possibly Noisy Training Data Examples Equipment or medical diagnosis
More informationData Mining. CS57300 Purdue University. Bruno Ribeiro. February 8, 2018
Data Mining CS57300 Purdue University Bruno Ribeiro February 8, 2018 Decision trees Why Trees? interpretable/intuitive, popular in medical applications because they mimic the way a doctor thinks model
More informationClassification Using Decision Trees
Classification Using Decision Trees 1. Introduction Data mining term is mainly used for the specific set of six activities namely Classification, Estimation, Prediction, Affinity grouping or Association
More informationA Quality Index for Decision Tree Pruning
A Quality Index for Decision Tree Pruning D. Fournier and B. Crémilleux GREYC, CNRS - UMR 6072 Université de Caen F-14032 Caen Cédex France Tel : +33 (0)2.31.56.73.79 Fax : +33 (0)2.31.56.73.30 fournier,cremilleux
More informationMachine Learning 3. week
Machine Learning 3. week Entropy Decision Trees ID3 C4.5 Classification and Regression Trees (CART) 1 What is Decision Tree As a short description, decision tree is a data classification procedure which
More informationLearning Decision Trees
Learning Decision Trees Machine Learning Spring 2018 1 This lecture: Learning Decision Trees 1. Representation: What are decision trees? 2. Algorithm: Learning decision trees The ID3 algorithm: A greedy
More informationday month year documentname/initials 1
ECE471-571 Pattern Recognition Lecture 13 Decision Tree Hairong Qi, Gonzalez Family Professor Electrical Engineering and Computer Science University of Tennessee, Knoxville http://www.eecs.utk.edu/faculty/qi
More informationUsing Impurity and Depth for Decision Trees Pruning
Using Impurity and epth for ecision Trees Pruning. Fournier and B. Crémilleux GRYC, CNRS - UPRSA 6072 Université de Caen F-14032 Caen Cédex France fournier,cremilleux @info.unicaen.fr Abstract Most pruning
More informationComputing and using the deviance with classification trees
Computing and using the deviance with classification trees Gilbert Ritschard Dept of Econometrics, University of Geneva Compstat, Rome, August 2006 Outline 1 Introduction 2 Motivation 3 Deviance for Trees
More informationOn Multi-Class Cost-Sensitive Learning
On Multi-Class Cost-Sensitive Learning Zhi-Hua Zhou, Xu-Ying Liu National Key Laboratory for Novel Software Technology, Nanjing University, Nanjing 210093, China {zhouzh, liuxy}@lamda.nju.edu.cn Abstract
More informationRule Generation using Decision Trees
Rule Generation using Decision Trees Dr. Rajni Jain 1. Introduction A DT is a classification scheme which generates a tree and a set of rules, representing the model of different classes, from a given
More informationData Mining: Concepts and Techniques. (3 rd ed.) Chapter 8. Chapter 8. Classification: Basic Concepts
Data Mining: Concepts and Techniques (3 rd ed.) Chapter 8 1 Chapter 8. Classification: Basic Concepts Classification: Basic Concepts Decision Tree Induction Bayes Classification Methods Rule-Based Classification
More informationData Mining. 3.6 Regression Analysis. Fall Instructor: Dr. Masoud Yaghini. Numeric Prediction
Data Mining 3.6 Regression Analysis Fall 2008 Instructor: Dr. Masoud Yaghini Outline Introduction Straight-Line Linear Regression Multiple Linear Regression Other Regression Models References Introduction
More informationGeneralization Error on Pruning Decision Trees
Generalization Error on Pruning Decision Trees Ryan R. Rosario Computer Science 269 Fall 2010 A decision tree is a predictive model that can be used for either classification or regression [3]. Decision
More informationAn Empirical Study of Building Compact Ensembles
An Empirical Study of Building Compact Ensembles Huan Liu, Amit Mandvikar, and Jigar Mody Computer Science & Engineering Arizona State University Tempe, AZ 85281 {huan.liu,amitm,jigar.mody}@asu.edu Abstract.
More informationLearning Decision Trees
Learning Decision Trees Machine Learning Fall 2018 Some slides from Tom Mitchell, Dan Roth and others 1 Key issues in machine learning Modeling How to formulate your problem as a machine learning problem?
More informationDevelopment of a Data Mining Methodology using Robust Design
Development of a Data Mining Methodology using Robust Design Sangmun Shin, Myeonggil Choi, Youngsun Choi, Guo Yi Department of System Management Engineering, Inje University Gimhae, Kyung-Nam 61-749 South
More informationCompact Ensemble Trees for Imbalanced Data
Compact Ensemble Trees for Imbalanced Data Yubin Park and Joydeep Ghosh Department of Electrical and Computer Engineering The University of Texas at Austin, Austin, TX-78712, USA Abstract. This paper introduces
More informationCS145: INTRODUCTION TO DATA MINING
CS145: INTRODUCTION TO DATA MINING 4: Vector Data: Decision Tree Instructor: Yizhou Sun yzsun@cs.ucla.edu October 10, 2017 Methods to Learn Vector Data Set Data Sequence Data Text Data Classification Clustering
More informationAijun An and Nick Cercone. Department of Computer Science, University of Waterloo. methods in a context of learning classication rules.
Discretization of Continuous Attributes for Learning Classication Rules Aijun An and Nick Cercone Department of Computer Science, University of Waterloo Waterloo, Ontario N2L 3G1 Canada Abstract. We present
More informationReview of Lecture 1. Across records. Within records. Classification, Clustering, Outlier detection. Associations
Review of Lecture 1 This course is about finding novel actionable patterns in data. We can divide data mining algorithms (and the patterns they find) into five groups Across records Classification, Clustering,
More informationDATA MINING WITH DIFFERENT TYPES OF X-RAY DATA
DATA MINING WITH DIFFERENT TYPES OF X-RAY DATA 315 C. K. Lowe-Ma, A. E. Chen, D. Scholl Physical & Environmental Sciences, Research and Advanced Engineering Ford Motor Company, Dearborn, Michigan, USA
More informationUsing HDDT to avoid instances propagation in unbalanced and evolving data streams
Using HDDT to avoid instances propagation in unbalanced and evolving data streams IJCNN 2014 Andrea Dal Pozzolo, Reid Johnson, Olivier Caelen, Serge Waterschoot, Nitesh V Chawla and Gianluca Bontempi 07/07/2014
More informationClassification and regression trees
Classification and regression trees Pierre Geurts p.geurts@ulg.ac.be Last update: 23/09/2015 1 Outline Supervised learning Decision tree representation Decision tree learning Extensions Regression trees
More informationComparison of Shannon, Renyi and Tsallis Entropy used in Decision Trees
Comparison of Shannon, Renyi and Tsallis Entropy used in Decision Trees Tomasz Maszczyk and W lodzis law Duch Department of Informatics, Nicolaus Copernicus University Grudzi adzka 5, 87-100 Toruń, Poland
More informationHoldout and Cross-Validation Methods Overfitting Avoidance
Holdout and Cross-Validation Methods Overfitting Avoidance Decision Trees Reduce error pruning Cost-complexity pruning Neural Networks Early stopping Adjusting Regularizers via Cross-Validation Nearest
More informationLearning theory. Ensemble methods. Boosting. Boosting: history
Learning theory Probability distribution P over X {0, 1}; let (X, Y ) P. We get S := {(x i, y i )} n i=1, an iid sample from P. Ensemble methods Goal: Fix ɛ, δ (0, 1). With probability at least 1 δ (over
More informationENSEMBLES OF DECISION RULES
ENSEMBLES OF DECISION RULES Jerzy BŁASZCZYŃSKI, Krzysztof DEMBCZYŃSKI, Wojciech KOTŁOWSKI, Roman SŁOWIŃSKI, Marcin SZELĄG Abstract. In most approaches to ensemble methods, base classifiers are decision
More informationMachine Learning & Data Mining
Group M L D Machine Learning M & Data Mining Chapter 7 Decision Trees Xin-Shun Xu @ SDU School of Computer Science and Technology, Shandong University Top 10 Algorithm in DM #1: C4.5 #2: K-Means #3: SVM
More informationCS 6375 Machine Learning
CS 6375 Machine Learning Decision Trees Instructor: Yang Liu 1 Supervised Classifier X 1 X 2. X M Ref class label 2 1 Three variables: Attribute 1: Hair = {blond, dark} Attribute 2: Height = {tall, short}
More informationChapter ML:III. III. Decision Trees. Decision Trees Basics Impurity Functions Decision Tree Algorithms Decision Tree Pruning
Chapter ML:III III. Decision Trees Decision Trees Basics Impurity Functions Decision Tree Algorithms Decision Tree Pruning ML:III-34 Decision Trees STEIN/LETTMANN 2005-2017 Splitting Let t be a leaf node
More informationTop-k Parametrized Boost
Top-k Parametrized Boost Turki Turki 1,4, Muhammad Amimul Ihsan 2, Nouf Turki 3, Jie Zhang 4, Usman Roshan 4 1 King Abdulaziz University P.O. Box 80221, Jeddah 21589, Saudi Arabia tturki@kau.edu.sa 2 Department
More informationthe tree till a class assignment is reached
Decision Trees Decision Tree for Playing Tennis Prediction is done by sending the example down Prediction is done by sending the example down the tree till a class assignment is reached Definitions Internal
More information43400 Serdang Selangor, Malaysia Serdang Selangor, Malaysia 4
An Extended ID3 Decision Tree Algorithm for Spatial Data Imas Sukaesih Sitanggang # 1, Razali Yaakob #2, Norwati Mustapha #3, Ahmad Ainuddin B Nuruddin *4 # Faculty of Computer Science and Information
More informationABC random forest for parameter estimation. Jean-Michel Marin
ABC random forest for parameter estimation Jean-Michel Marin Université de Montpellier Institut Montpelliérain Alexander Grothendieck (IMAG) Institut de Biologie Computationnelle (IBC) Labex Numev! joint
More informationDecision Trees / NLP Introduction
Decision Trees / NLP Introduction Dr. Kevin Koidl School of Computer Science and Statistic Trinity College Dublin ADAPT Research Centre The ADAPT Centre is funded under the SFI Research Centres Programme
More informationThe Quadratic Entropy Approach to Implement the Id3 Decision Tree Algorithm
Journal of Computer Science and Information Technology December 2018, Vol. 6, No. 2, pp. 23-29 ISSN: 2334-2366 (Print), 2334-2374 (Online) Copyright The Author(s). All Rights Reserved. Published by American
More informationDecision Trees Part 1. Rao Vemuri University of California, Davis
Decision Trees Part 1 Rao Vemuri University of California, Davis Overview What is a Decision Tree Sample Decision Trees How to Construct a Decision Tree Problems with Decision Trees Classification Vs Regression
More informationGrowing a Large Tree
STAT 5703 Fall, 2004 Data Mining Methodology I Decision Tree I Growing a Large Tree Contents 1 A Single Split 2 1.1 Node Impurity.................................. 2 1.2 Computation of i(t)................................
More informationCS6375: Machine Learning Gautam Kunapuli. Decision Trees
Gautam Kunapuli Example: Restaurant Recommendation Example: Develop a model to recommend restaurants to users depending on their past dining experiences. Here, the features are cost (x ) and the user s
More informationUVA CS 4501: Machine Learning
UVA CS 4501: Machine Learning Lecture 21: Decision Tree / Random Forest / Ensemble Dr. Yanjun Qi University of Virginia Department of Computer Science Where are we? è Five major sections of this course
More informationLecture 7: DecisionTrees
Lecture 7: DecisionTrees What are decision trees? Brief interlude on information theory Decision tree construction Overfitting avoidance Regression trees COMP-652, Lecture 7 - September 28, 2009 1 Recall:
More informationDecision Trees Entropy, Information Gain, Gain Ratio
Changelog: 14 Oct, 30 Oct Decision Trees Entropy, Information Gain, Gain Ratio Lecture 3: Part 2 Outline Entropy Information gain Gain ratio Marina Santini Acknowledgements Slides borrowed and adapted
More informationStatistical Machine Learning from Data
Samy Bengio Statistical Machine Learning from Data 1 Statistical Machine Learning from Data Ensembles Samy Bengio IDIAP Research Institute, Martigny, Switzerland, and Ecole Polytechnique Fédérale de Lausanne
More informationMultivariate interdependent discretization in discovering the best correlated attribute
Data Mining VI 35 Multivariate interdependent discretization in discovering the best correlated attribute S. Chao & Y. P. Li Faculty of Science and Technology, University of Macau, China Abstract The decision
More information1. Courses are either tough or boring. 2. Not all courses are boring. 3. Therefore there are tough courses. (Cx, Tx, Bx, )
Logic FOL Syntax FOL Rules (Copi) 1. Courses are either tough or boring. 2. Not all courses are boring. 3. Therefore there are tough courses. (Cx, Tx, Bx, ) Dealing with Time Translate into first-order
More informationAdaptive Crowdsourcing via EM with Prior
Adaptive Crowdsourcing via EM with Prior Peter Maginnis and Tanmay Gupta May, 205 In this work, we make two primary contributions: derivation of the EM update for the shifted and rescaled beta prior and
More informationA Bayesian approach to estimate probabilities in classification trees
A Bayesian approach to estimate probabilities in classification trees Andrés Cano and Andrés R. Masegosa and Serafín Moral Department of Computer Science and Artificial Intelligence University of Granada
More informationRandom Forests: Finding Quasars
This is page i Printer: Opaque this Random Forests: Finding Quasars Leo Breiman Michael Last John Rice Department of Statistics University of California, Berkeley 0.1 Introduction The automatic classification
More informationLecture 3: Decision Trees
Lecture 3: Decision Trees Cognitive Systems - Machine Learning Part I: Basic Approaches of Concept Learning ID3, Information Gain, Overfitting, Pruning last change November 26, 2014 Ute Schmid (CogSys,
More informationMachine Learning on temporal data
Machine Learning on temporal data Classification rees for ime Series Ahlame Douzal (Ahlame.Douzal@imag.fr) AMA, LIG, Université Joseph Fourier Master 2R - MOSIG (2011) Plan ime Series classification approaches
More informationEVALUATING RISK FACTORS OF BEING OBESE, BY USING ID3 ALGORITHM IN WEKA SOFTWARE
EVALUATING RISK FACTORS OF BEING OBESE, BY USING ID3 ALGORITHM IN WEKA SOFTWARE Msc. Daniela Qendraj (Halidini) Msc. Evgjeni Xhafaj Department of Mathematics, Faculty of Information Technology, University
More informationLBR-Meta: An Efficient Algorithm for Lazy Bayesian Rules
LBR-Meta: An Efficient Algorithm for Lazy Bayesian Rules Zhipeng Xie School of Computer Science Fudan University 220 Handan Road, Shanghai 200433, PR. China xiezp@fudan.edu.cn Abstract LBR is a highly
More informationDyadic Classification Trees via Structural Risk Minimization
Dyadic Classification Trees via Structural Risk Minimization Clayton Scott and Robert Nowak Department of Electrical and Computer Engineering Rice University Houston, TX 77005 cscott,nowak @rice.edu Abstract
More informationIntroduction to Machine Learning CMU-10701
Introduction to Machine Learning CMU-10701 23. Decision Trees Barnabás Póczos Contents Decision Trees: Definition + Motivation Algorithm for Learning Decision Trees Entropy, Mutual Information, Information
More informationAn Overview of Alternative Rule Evaluation Criteria and Their Use in Separate-and-Conquer Classifiers
An Overview of Alternative Rule Evaluation Criteria and Their Use in Separate-and-Conquer Classifiers Fernando Berzal, Juan-Carlos Cubero, Nicolás Marín, and José-Luis Polo Department of Computer Science
More informationPattern-Based Decision Tree Construction
Pattern-Based Decision Tree Construction Dominique Gay, Nazha Selmaoui ERIM - University of New Caledonia BP R4 F-98851 Nouméa cedex, France {dominique.gay, nazha.selmaoui}@univ-nc.nc Jean-François Boulicaut
More informationCS 380: ARTIFICIAL INTELLIGENCE MACHINE LEARNING. Santiago Ontañón
CS 380: ARTIFICIAL INTELLIGENCE MACHINE LEARNING Santiago Ontañón so367@drexel.edu Summary so far: Rational Agents Problem Solving Systematic Search: Uninformed Informed Local Search Adversarial Search
More informationData Mining und Maschinelles Lernen
Data Mining und Maschinelles Lernen Ensemble Methods Bias-Variance Trade-off Basic Idea of Ensembles Bagging Basic Algorithm Bagging with Costs Randomization Random Forests Boosting Stacking Error-Correcting
More informationSelected Algorithms of Machine Learning from Examples
Fundamenta Informaticae 18 (1993), 193 207 Selected Algorithms of Machine Learning from Examples Jerzy W. GRZYMALA-BUSSE Department of Computer Science, University of Kansas Lawrence, KS 66045, U. S. A.
More informationChapter 6: Classification
Chapter 6: Classification 1) Introduction Classification problem, evaluation of classifiers, prediction 2) Bayesian Classifiers Bayes classifier, naive Bayes classifier, applications 3) Linear discriminant
More informationShort Note: Naive Bayes Classifiers and Permanence of Ratios
Short Note: Naive Bayes Classifiers and Permanence of Ratios Julián M. Ortiz (jmo1@ualberta.ca) Department of Civil & Environmental Engineering University of Alberta Abstract The assumption of permanence
More informationIndex of Balanced Accuracy: A Performance Measure for Skewed Class Distributions
Index of Balanced Accuracy: A Performance Measure for Skewed Class Distributions V. García 1,2, R.A. Mollineda 2, and J.S. Sánchez 2 1 Lab. Reconocimiento de Patrones, Instituto Tecnológico de Toluca Av.
More informationAbstract. 1 Introduction. Ian H. Witten Department of Computer Science University of Waikato Hamilton, New Zealand
Í Ò È ÖÑÙØ Ø ÓÒ Ì Ø ÓÖ ØØÖ ÙØ Ë Ð Ø ÓÒ Ò ÓÒ ÌÖ Eibe Frank Department of Computer Science University of Waikato Hamilton, New Zealand eibe@cs.waikato.ac.nz Ian H. Witten Department of Computer Science University
More informationFrom statistics to data science. BAE 815 (Fall 2017) Dr. Zifei Liu
From statistics to data science BAE 815 (Fall 2017) Dr. Zifei Liu Zifeiliu@ksu.edu Why? How? What? How much? How many? Individual facts (quantities, characters, or symbols) The Data-Information-Knowledge-Wisdom
More informationEnsembles of Classifiers.
Ensembles of Classifiers www.biostat.wisc.edu/~dpage/cs760/ 1 Goals for the lecture you should understand the following concepts ensemble bootstrap sample bagging boosting random forests error correcting
More informationUnsupervised Classification via Convex Absolute Value Inequalities
Unsupervised Classification via Convex Absolute Value Inequalities Olvi L. Mangasarian Abstract We consider the problem of classifying completely unlabeled data by using convex inequalities that contain
More informationClassification: Decision Trees
Classification: Decision Trees Outline Top-Down Decision Tree Construction Choosing the Splitting Attribute Information Gain and Gain Ratio 2 DECISION TREE An internal node is a test on an attribute. A
More informationFinite Mixture Model of Bounded Semi-naive Bayesian Networks Classifier
Finite Mixture Model of Bounded Semi-naive Bayesian Networks Classifier Kaizhu Huang, Irwin King, and Michael R. Lyu Department of Computer Science and Engineering The Chinese University of Hong Kong Shatin,
More informationQuestion of the Day. Machine Learning 2D1431. Decision Tree for PlayTennis. Outline. Lecture 4: Decision Tree Learning
Question of the Day Machine Learning 2D1431 How can you make the following equation true by drawing only one straight line? 5 + 5 + 5 = 550 Lecture 4: Decision Tree Learning Outline Decision Tree for PlayTennis
More informationMachine Learning Recitation 8 Oct 21, Oznur Tastan
Machine Learning 10601 Recitation 8 Oct 21, 2009 Oznur Tastan Outline Tree representation Brief information theory Learning decision trees Bagging Random forests Decision trees Non linear classifier Easy
More informationEnsemble Methods and Random Forests
Ensemble Methods and Random Forests Vaishnavi S May 2017 1 Introduction We have seen various analysis for classification and regression in the course. One of the common methods to reduce the generalization
More informationUniversität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Intelligent Data Analysis. Decision Trees
Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Intelligent Data Analysis Decision Trees Paul Prasse, Niels Landwehr, Tobias Scheffer Decision Trees One of many applications:
More informationClassification Based on Logical Concept Analysis
Classification Based on Logical Concept Analysis Yan Zhao and Yiyu Yao Department of Computer Science, University of Regina, Regina, Saskatchewan, Canada S4S 0A2 E-mail: {yanzhao, yyao}@cs.uregina.ca Abstract.
More informationData Mining and Knowledge Discovery: Practice Notes
Data Mining and Knowledge Discovery: Practice Notes dr. Petra Kralj Novak Petra.Kralj.Novak@ijs.si 7.11.2017 1 Course Prof. Bojan Cestnik Data preparation Prof. Nada Lavrač: Data mining overview Advanced
More informationStatistics and learning: Big Data
Statistics and learning: Big Data Learning Decision Trees and an Introduction to Boosting Sébastien Gadat Toulouse School of Economics February 2017 S. Gadat (TSE) SAD 2013 1 / 30 Keywords Decision trees
More informationP leiades: Subspace Clustering and Evaluation
P leiades: Subspace Clustering and Evaluation Ira Assent, Emmanuel Müller, Ralph Krieger, Timm Jansen, and Thomas Seidl Data management and exploration group, RWTH Aachen University, Germany {assent,mueller,krieger,jansen,seidl}@cs.rwth-aachen.de
More informationVoting Massive Collections of Bayesian Network Classifiers for Data Streams
Voting Massive Collections of Bayesian Network Classifiers for Data Streams Remco R. Bouckaert Computer Science Department, University of Waikato, New Zealand remco@cs.waikato.ac.nz Abstract. We present
More informationMachine Learning 2nd Edition
ETHEM ALPAYDIN, modified by Leonardo Bobadilla and some parts from http://wwwcstauacil/~apartzin/machinelearning/ and wwwcsprincetonedu/courses/archive/fall01/cs302 /notes/11/emppt The MIT Press, 2010
More informationML techniques. symbolic techniques different types of representation value attribute representation representation of the first order
MACHINE LEARNING Definition 1: Learning is constructing or modifying representations of what is being experienced [Michalski 1986], p. 10 Definition 2: Learning denotes changes in the system That are adaptive
More informationI D I A P. Online Policy Adaptation for Ensemble Classifiers R E S E A R C H R E P O R T. Samy Bengio b. Christos Dimitrakakis a IDIAP RR 03-69
R E S E A R C H R E P O R T Online Policy Adaptation for Ensemble Classifiers Christos Dimitrakakis a IDIAP RR 03-69 Samy Bengio b I D I A P December 2003 D a l l e M o l l e I n s t i t u t e for Perceptual
More informationTheoretical comparison between the Gini Index and Information Gain criteria
Annals of Mathematics and Artificial Intelligence 41: 77 93, 2004. 2004 Kluwer Academic Publishers. Printed in the Netherlands. Theoretical comparison between the Gini Index and Information Gain criteria
More informationRandomized Decision Trees
Randomized Decision Trees compiled by Alvin Wan from Professor Jitendra Malik s lecture Discrete Variables First, let us consider some terminology. We have primarily been dealing with real-valued data,
More informationNotes on Machine Learning for and
Notes on Machine Learning for 16.410 and 16.413 (Notes adapted from Tom Mitchell and Andrew Moore.) Learning = improving with experience Improve over task T (e.g, Classification, control tasks) with respect
More informationUniversität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Decision Trees. Tobias Scheffer
Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Decision Trees Tobias Scheffer Decision Trees One of many applications: credit risk Employed longer than 3 months Positive credit
More informationClassification using stochastic ensembles
July 31, 2014 Topics Introduction Topics Classification Application and classfication Classification and Regression Trees Stochastic ensemble methods Our application: USAID Poverty Assessment Tools Topics
More informationAdvanced Statistical Methods: Beyond Linear Regression
Advanced Statistical Methods: Beyond Linear Regression John R. Stevens Utah State University Notes 3. Statistical Methods II Mathematics Educators Worshop 28 March 2009 1 http://www.stat.usu.edu/~jrstevens/pcmi
More informationDecision Tree. Decision Tree Learning. c4.5. Example
Decision ree Decision ree Learning s of systems that learn decision trees: c4., CLS, IDR, ASSISA, ID, CAR, ID. Suitable problems: Instances are described by attribute-value couples he target function has
More informationDecision Trees. Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University. February 5 th, Carlos Guestrin 1
Decision Trees Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University February 5 th, 2007 2005-2007 Carlos Guestrin 1 Linear separability A dataset is linearly separable iff 9 a separating
More informationDecision Tree Learning
Decision Tree Learning Goals for the lecture you should understand the following concepts the decision tree representation the standard top-down approach to learning a tree Occam s razor entropy and information
More informationDecision Tree Learning
Decision Tree Learning Goals for the lecture you should understand the following concepts the decision tree representation the standard top-down approach to learning a tree Occam s razor entropy and information
More informationDepartment of Computer Science, University of Waikato, Hamilton, New Zealand. 2
Predicting Apple Bruising using Machine Learning G.Holmes 1, S.J.Cunningham 1, B.T. Dela Rue 2 and A.F. Bollen 2 1 Department of Computer Science, University of Waikato, Hamilton, New Zealand. 2 Lincoln
More informationModels, Data, Learning Problems
Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Models, Data, Learning Problems Tobias Scheffer Overview Types of learning problems: Supervised Learning (Classification, Regression,
More information