An asymmetric entropy measure for decision trees

Size: px
Start display at page:

Download "An asymmetric entropy measure for decision trees"

Transcription

1 An asymmetric entropy measure for decision trees Simon Marcellin Laboratoire ERIC Université Lumière Lyon 2 5 av. Pierre Mendès-France BRON Cedex France simon.marcellin@univ-lyon2.fr Djamel A. Zighed Laboratoire ERIC Université Lumière Lyon 2 5 av. Pierre Mendès-France BRON Cedex France zighed@univ-lyon2.fr Gilbert Ritschard THEMES, Department of Econometrics Uni-Mail, 4, bd du Pont-d Arve, room 5232 CH-2 Geneva 4 Switzerland Gilbert.Ritschard@themes.unige.ch Abstract In this paper we present a new entropy measure to grow decision trees. This measure has the particularity to be asymmetric, allowing the user to grow trees which better correspond to his expectations in terms of recall and precision on each class. Then we propose decision rules adapted to such trees. Experiments have been led on real medical data from breast cancer screening units. Keywords: decision trees, entropy, class imbalance, asymmetric learning, decision rules. Introduction Classical decision trees verify two properties linked one to each other: on the one hand, the symmetry of the splitting criterion (Shannon entropy for ID3 [2] or Gini measure for CART [4]), meaning that maximal uncertainty is reached for equiprobability situations; on the other hand, the application of the majority rule to affect a class to a leaf. But, in many problems, distributions are not balanced and the different classes have not the same importance [8] for the user. This happens for example in marketing [5], medical computer-aided diagnosis or fraud detection [], where some classes are much more important than others. Thus users requirements towards a classification model are different. To illustrate this problem one should consider the example of computer-aided diagnosis of breast cancer. The aim is to label regions as cancer or non-cancer on digitized films. In this domain the cancer class is less represented in the datasets, but it is imperative to class all them well. A standard decision tree considers that the maximum uncertainty is reached in leaves having 5% of cancer and 5% of non-cancer. Likewise, the leaves are classified as cancer as soon as cancer proportion exceeds 5% and noncancer otherwise. We may see in this example that this is not suitable to this kind of problems. Indeed as non-cancer decision may have serious sequel, we could decide to classify in this category only the leaves with a non-cancer proportion exceeding 9%. On the other hand, it is so important to find all cancers that we could decide to classify leaves as cancer from a proportion of 2%. Some approaches have permitted to deal with this problem. We may class them into four categories []. First, the cost-sensitive approaches allow penalizing some types of errors, balancing the number of examples of the concerned class in the confusion matrix. Then the sampling methods allow to over-represent the minority class or under-represent the majority class [, 4]. Some wrapper methods as MetaCost, that produce several instances of a classifier by bootstrap, re-label each example by votes and build another model using the new labels, have also been proposed [7]. At last, other methods closer to our try to include a bias directly in the classifier, in particular by using a cost function instead of an entropy criterion [9, 6]. Concerning more precisely de-

2 Uncertainty. Shannon Quadratic. Probability of class Figure : Examples of standard entropy measures for a two-class problem cision trees, it is also possible to modify the decision rules of the leaves. This does not change the produced tree but just the decision for each leaf. In order to have an entirely coherent model, we propose to adapt the splitting criterion to these rules, so that the maximum uncertainty situation corresponds to the indecision situation in the decision rules. For example, in our two-class problem, if we classify a leaf as cancer when the frequency of cancers exceeds 2%, and as non-cancer when their frequency exceeds 9%, the maximal uncertainty must be reached when the proportion of cancer class is between % and 2%. We will see in section 2 how we have determined an asymmetric uncertainty measure. Section 3 presents our asymmetric entropy. In section 4 we present how to adapt the decision rules to this asymmetric entropy measure. Section 5 details the results achieved on a real medical dataset within the framework of a computer-aided diagnosis of breast cancer system, and two standard datasets from the UCI repository [9]. Finally, section 6 concludes and proposes extensions to our work. 2 Asymmetric uncertainty measure The different splitting criteria that have been proposed are symmetric, meaning that maximal uncertainty is reached in equiprobability situations [5]. Yet we need the maximal uncertainty to be reached for a given probability, noted w. This criterion should verify the properties of the entropy, especially the concavity. More formally, we search for a function h of p (frequency of the considered class in a leaf) with a parameter w (value for p where the uncertainty is maximal) with p [, ] and w [, ]. This function should respect the entropy properties, except that the maximum should be reached in w instead of. So the constraints are [5]: Minimality: Strict concavity: Maximality: h w (p = ) = () h w (p = ) = (2) d 2 h w (p) dp 2 < (3) dh w (p = w) dp = (4) We assume that it exists a rational function verifying those conditions: h w (p) = ap2 + bp + c dp + e To remove a degree of freedom we may add a constraint on the maximum value: h w (p = w) = (5) Those constraints allow us to get the following function: h w (p) = p 2 + p ( 2w + )p + w 2 = p( p) ( 2w + )p + w 2 With this function we may represent the contribution of a class to the global uncertainty of a node. It verifies the hypotheses () to (5).

3 Class Class 2 Normalized sum Uncertainty Uncertainty... p. Probability of class Figure 2: Asymmetric uncertainty measure for w= Figure 3: Two-class asymmetric entropy, W = [, ] 3 Asymmetric entropy So we have determined an uncertainty function. To obtain the asymmetric entropy of a node in a k-class problem, we propose to use a sum: k H W (P) = h wi (p i ) i= with P = [p,..., p k ], W = [w,..., w k ] Then to normalize it between and. This function has the following properties: Minimality: H W (P) = if i p i = : the leaf is pure. Strict concavity: As the sum of concave functions is concave, H W (P) is strictly concave. Maximality: the unique analytic solution P = [p,..., p k ] of H W (P) = i dh w i (p i ) = i p i dp i exists. We just give here an approximation of its position. The maximum of the asymmetric entropy function is located in the interval defined as follows: p i > w i i if k i= w i < p i < w i i if k i= w i > p i = w i i if k i= w i = Figure 4 shows the value of the entropy for a three-class problem, in function of p and p 2 (the third class probability is implicit because p 3 = p p 2 ). We can now grow a decision tree taking into account users requirements in terms of recall and precision for each class. At each new split in the decision tree, we try to get rid of the uncertainty situations represented by our asymmetric entropy measure. The aim of the asymmetric entropy is to produce a tree tailored the best as possible to the decision rules provided by the user. To be coherent, the parameters should be set in order to reflect those rules. We will see in the next section the way those rules are managed. 4 Impact on decision rules How may we classify a leaf having the distribution P = [p,..., p k ]? Standard trees set the class C with the majority rule: (R) C = i if p i > p j j i According to the construction of our entropy measure, we propose the rule: (R) C = i if p i > w i

4 Uncertainty Ø C=.. Probability of class Figure 4: Three-class asymmetric entropy, W = [.,, ] Figure 5: Decision rules for a two-class problem, W = [, ] Nevertheless, this rule is not sufficient: conflicts may occur if more than one class satisfy the rule R at the same time (see figure 6) Moreover, if k i= w i >, we will confront situations where the rule R is not satisfied for any class (see figure 5). If R is satisfied for several classes, we must determine the winning class. We may propose different rules, according to the considered problem: set the leaf to the class i having the smallest constraint (min(w) = w i, to privilege recall rate on minority class), or to the class having the largest p i /w i rate. Nevertheless, it seems important to us that the rules and the entropy should be coherent, i.e. the frontier point between the choice of one class or another corresponds to a maximal entropy. So we propose the following method. A leaf is litigious if more than one class satisfy the rule R: p > w. Those classes form the ensemble of candidates: S. To decide, we compare all the pairs of distinct classes a and b of S. For each pair, we will determine the winning class. As we will see, the choice is based on the entropy. Its concavity property implies that except in indeterminate cases, one class dominate all others. Probability of class 2. or 3 C=3 C=, 2 or 3 C= or 2 C= or 3. Probability of class Figure 6: Decision rules for a three-class problem, W = [.,, ] C=

5 We search the distribution between p a and p b which provides us a maximal entropy, setting all other probabilities to their observed value p i = α i : H = h wi (α i ) h wa (p a ) h wb (p b ) h wk (α k ) If we note δ = p i i a, i b, we obtain: H = h wi (α i ) h wa (p a ) h wb ( p a δ) h wk (α k ) Probability of class 2 or 3 C=3 Ø C= or 2 C= This expression is only function of p a. We search its maximum, i.e.: dh = d[h w a (p a ) + h wb ( p a δ)] = dp a dp a. C= or 3. Probability of class Figure 7: Decision rules for a three-class problem, W = [,, ] The single solution p a exists. We can prove that this solution is such that p a (w a, δ w b ) (see the maximality property of the asymmetric entropy). Then we consider three situations: If p a > p a, class a win If p a < p a, class b win If p a = p a: indeterminate Unless we are in an indeterminate situation, one and just one class dominates all others within the ensemble S. This class is assigned to the leaf. Figures 7 and 8 illustrate this rule. In figure 8, to decide in the areas where R is satisfied for several classes we consider the indecision curves given by the crests of the entropy as defined above. The choice is made in function of the side of the crest where the considered distribution is located. So, using the asymmetric entropy could produce leaves for which we can t decide, and thus let some examples unclassified. This asymmetric entropy may be compared to cost measures ([]) but our measure respects the expected properties of an entropy function. Moreover the parameters are set intelligibly. Probability of class 2 or 3. C=3 Ø C= or 2 C= or 3. Probability of class Figure 8: Indecision curves ( crests ) of entropy for a three-class problem, W = [,, ] C=

6 5 Experiments We have experimented our method on real data from breast cancer screening units. The aim is to detect tumours on digitized mammograms. Several hundreds of films have been annotated by radiologists. Then, those films have been segmented with imaging methods, in order to obtain regions of interest. Those zones have been labelled as cancer if they correspond to a radiologist annotation or non-cancer otherwise. At last, a great number of features (based on grey-level histogram, shape and texture) have been computed. The aim is to build a model able to separate cancers and non-cancers. (See the description of the mammo dataset table ). Table : Description of the mammo dataset Dataset # of # of % of features examples cancer Mammo % To allow comparisons with other approaches, we have also tested our method on two standard machine learning datasets [9, 6]. On these imbalanced datasets we tried to obtain the best recall on the minority class, keeping a correct precision on this class. Table 2: Description of the UCI dataset Dataset # of # of % of feat. ex. min. class Hypothyroid % Satimage % With each dataset we have built three series of classifiers using folds cross-validation: standard C4.5 [3] and RandomForest [3, 2]; C4.5 and RandomForest using cost matrix with the WEKA software [4]; and tree and RandomForest with the asymmetric entropy criterion we proposed (on each dataset, different Random Forests are built with the same number of trees and features). We also compare our results with those obtained by Chen and Liaw [6]. For each test we present recall and precision rates on the minority class. As we present only two-class problems, the results on the majority class are implicit. So we do not give them here. Table 3: Parameters for mammo Method Parameters C4.5 - Random Forest trees, 28 features C4.5 with cost () cost : C4.5 with cost (2) cost :2 RF with cost cost :35 Asymmetric tree w =, w 2 = Asymmetric RF w =, w 2 =. Table 4: Results for mammo Method Recall Precision C4.5 3 Random Forest.. C4.5 with cost ().8 C4.5 with cost (2). 5 RF with cost 7.4 Asymmetric tree. Asymmetric RF 8. For asymmetric methods and cost matrix we used several sets of parameters. Tables 4, 5 and 6 present the best results for each method. We may see that the best results are always obtained with an asymmetric Random Forest. For the real-data mammo dataset, asymmetric entropy with a single decision tree does not improve results (cost-sensitive C4.5 is better), but causes real enhancement with the Random Forest. Table 5: Parameters for hypothyroid Method Parameters C4.5 - Random Forest 3 trees, 4 features C4.5 with cost cost :5 RF with cost cost :5 Asymmetric tree w =, w 2 =. Asymmetric RF w =, w 2 =. BRF cutoff= WRF weight=:5 For the hypothyroid dataset, we also present the results obtained with Balanced Random

7 Table 6: Results for hypothyroid Method Recall Precision C Random Forest 6 6 C4.5 with cost 8 6 RF with cost 9 6 Asymmetric tree 8 7 Asymmetric RF. 8 BRF 5 3 WRF 3 3 Forest and Weighted Random Forest [6]. On this dataset, asymmetric Random Forest dominates all other methods, whatever the parameters. Table 8: Results for satimage Method Recall Precision C Random Forest 4 4 C4.5 with cost () 2 9 C4.5 with cost (2) 2 RF with cost () 5 3 RF with cost (2) 4 3 Asymmetric tree () 3 Asymmetric tree (2) 9 Asymmetric RF () 9 9 Asymmetric RF (2) 6 BRF () 7 4 BRF (2) 7 6 WRF () 9 9 WRF (2) 7 Table 7: Parameters for satimage Method Parameters C4.5 - Random Forest 3 trees, 6 features C4.5 with cost () cost :5 C4.5 with cost (2) cost :5 RF with cost () cost :2 RF with cost (2) cost :5 Asymmetric tree () w = 8, w 2 =. Asymmetric tree (2) w =, w 2 = Asymmetric RF () w = 8, w 2 =. Asymmetric RF (2) w =, w 2 = BRF () cutoff= BRF (2) cutoff= WRF () (Not specified) WRF (2) (Not specified) With the satimage dataset, we have chosen the minority class and merged the five others to create the majority class. We also present BRF and WRF results. We may observe on figure 9 that for the satimage dataset, the asymmetric Random Forest dominates other methods again. The Random Forest with cost matrix is very close to WRF and BRF. Finally we may notice that for this dataset, asymmetric tree is a bit better than C4.5 using cost matrix. Figure 9: Recall and precision for satimage with different parameters 6 Conclusion and future works We propose an asymmetric entropy measure for decision trees. By using it with adapted decision rules, we can grow trees taking into account the user s specifications in terms of recall and precision for each class, and thus obtain better results on imbalanced datasets. We may notice that this method entails no additional computing complexity, and that the parameters can be set intelligibly by the user. In the future we plan to improve our works on different points. Particularly, the entropy that we propose may be enhanced, in particular the aggregation of the uncertainty (we use a sum) for each class. Experiments with more than two class problems will be led. We

8 also will lead a theoretical comparison between this approach and the ones that try to introduce costs in the splitting criterion. Indeed, as those two approaches may be expressed in the same way, it seems to us that using costs entails ruptures in the concavity of the splitting criterion, which could produce under-optimal trees. Acknowledgements This work has been realized within a thesis cofinanced by the French Ministry of Research and Industry. We would like to thank the ARDOC breast cancer screening unit and the senology unit of Centre-République (Clermont-Ferrand, France) for their expertise and the mammography data. We also thank Pierre-Emmanuel Jouve, Julien Thomas and Jérémy Clech (Fenics company, Lyon, France) for their help and advices. References [] R. Barandela, J. S. Sanchez, et al. (23). Strategies for learning in class imbalance problems, In Pattern Recognition, volume 36(3), pages [2] L. Breiman (2). Random Forests, In Machine Learning, volume 45(), pages [3] L. Breiman (22). Looking inside the black box. In WALD lectures, the 277th meeting of the Institute of Mathematical Statistics, Banff, Alberta, Canada. [4] L. Breiman, J. H. Friedman, et al. (984). Classification and Regression Trees. Belmont, Wadsworth. [5] J.-H. Chauchat, R. Rakotomalala, et al. (2). Targeting Customer Groups using Gain and Cost Matrix: a Marketing Application. In Proceedings of the Datamining and Marketing Whorkshop, 5th European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD ), pages -4. Data. Technical Report. Berkeley, Department of Statistics, University of California. [7] P. Domingos (999). MetaCost: A general method for making classifiers costsensitive. In Proceedings of the Fifth International Conference on Knowledge Discovery and Data Mining (KDD-99), pages [8] C. Elkan (2). The Foundations of Cost-Sensitive Learning. In Proceedings of the Seventeenth International Joint Conference on Artificial Intelligence (IJ- CAI ), pages [9] S. Hettich and S. D. Bay (999). The UCI KDD Archive. Irvine, california, USA, University of California, Department of Information and Computer Science. [] C. X. Ling, Q. Yang, et al. (24). Decision trees with minimal costs. In Proceedings of the twenty-first international conference on Machine learning, 69, Banff, Alberta, Canada. [] F. Provost (2). Learning with Imbalanced Data Sets. In Invited paper for the AAAI 2 Workshop on Imbalanced Data Sets. [2] J. R. Quinlan (986). Induction of Decision Trees, In Machine Learning volume (), pages 8-6. [3] J. R. Quinlan (993). C4.5: programs for machine learning. San Francisco, Morgan Kaufmann Publishers Inc. [4] I. H. Witten and E. Frank (25). Data Mining: Practical machine learning tools and techniques. San Francisco. [5] D. Zighed and R. Rakotomalala (2). Graphe d induction Apprentissage et Data Mining. Paris, Hermès. [6] C. Chen, A. Liaw, et al. (24). Using Random Forest to Learn Imbalanced

On Multi-Class Cost-Sensitive Learning

On Multi-Class Cost-Sensitive Learning On Multi-Class Cost-Sensitive Learning Zhi-Hua Zhou and Xu-Ying Liu National Laboratory for Novel Software Technology Nanjing University, Nanjing 210093, China {zhouzh, liuxy}@lamda.nju.edu.cn Abstract

More information

Decision Trees. CS57300 Data Mining Fall Instructor: Bruno Ribeiro

Decision Trees. CS57300 Data Mining Fall Instructor: Bruno Ribeiro Decision Trees CS57300 Data Mining Fall 2016 Instructor: Bruno Ribeiro Goal } Classification without Models Well, partially without a model } Today: Decision Trees 2015 Bruno Ribeiro 2 3 Why Trees? } interpretable/intuitive,

More information

Optimizing Abstaining Classifiers using ROC Analysis. Tadek Pietraszek / 'tʌ dek pɪe 'trʌ ʃek / ICML 2005 August 9, 2005

Optimizing Abstaining Classifiers using ROC Analysis. Tadek Pietraszek / 'tʌ dek pɪe 'trʌ ʃek / ICML 2005 August 9, 2005 IBM Zurich Research Laboratory, GSAL Optimizing Abstaining Classifiers using ROC Analysis Tadek Pietraszek / 'tʌ dek pɪe 'trʌ ʃek / pie@zurich.ibm.com ICML 2005 August 9, 2005 To classify, or not to classify:

More information

A Metric Approach to Building Decision Trees based on Goodman-Kruskal Association Index

A Metric Approach to Building Decision Trees based on Goodman-Kruskal Association Index A Metric Approach to Building Decision Trees based on Goodman-Kruskal Association Index Dan A. Simovici and Szymon Jaroszewicz University of Massachusetts at Boston, Department of Computer Science, Boston,

More information

Decision Trees. Common applications: Health diagnosis systems Bank credit analysis

Decision Trees. Common applications: Health diagnosis systems Bank credit analysis Decision Trees Rodrigo Fernandes de Mello Invited Professor at Télécom ParisTech Associate Professor at Universidade de São Paulo, ICMC, Brazil http://www.icmc.usp.br/~mello mello@icmc.usp.br Decision

More information

Induction of Decision Trees

Induction of Decision Trees Induction of Decision Trees Peter Waiganjo Wagacha This notes are for ICS320 Foundations of Learning and Adaptive Systems Institute of Computer Science University of Nairobi PO Box 30197, 00200 Nairobi.

More information

Lossless Online Bayesian Bagging

Lossless Online Bayesian Bagging Lossless Online Bayesian Bagging Herbert K. H. Lee ISDS Duke University Box 90251 Durham, NC 27708 herbie@isds.duke.edu Merlise A. Clyde ISDS Duke University Box 90251 Durham, NC 27708 clyde@isds.duke.edu

More information

M chi h n i e n L e L arni n n i g Decision Trees Mac a h c i h n i e n e L e L a e r a ni n ng

M chi h n i e n L e L arni n n i g Decision Trees Mac a h c i h n i e n e L e L a e r a ni n ng 1 Decision Trees 2 Instances Describable by Attribute-Value Pairs Target Function Is Discrete Valued Disjunctive Hypothesis May Be Required Possibly Noisy Training Data Examples Equipment or medical diagnosis

More information

Data Mining. CS57300 Purdue University. Bruno Ribeiro. February 8, 2018

Data Mining. CS57300 Purdue University. Bruno Ribeiro. February 8, 2018 Data Mining CS57300 Purdue University Bruno Ribeiro February 8, 2018 Decision trees Why Trees? interpretable/intuitive, popular in medical applications because they mimic the way a doctor thinks model

More information

Classification Using Decision Trees

Classification Using Decision Trees Classification Using Decision Trees 1. Introduction Data mining term is mainly used for the specific set of six activities namely Classification, Estimation, Prediction, Affinity grouping or Association

More information

A Quality Index for Decision Tree Pruning

A Quality Index for Decision Tree Pruning A Quality Index for Decision Tree Pruning D. Fournier and B. Crémilleux GREYC, CNRS - UMR 6072 Université de Caen F-14032 Caen Cédex France Tel : +33 (0)2.31.56.73.79 Fax : +33 (0)2.31.56.73.30 fournier,cremilleux

More information

Machine Learning 3. week

Machine Learning 3. week Machine Learning 3. week Entropy Decision Trees ID3 C4.5 Classification and Regression Trees (CART) 1 What is Decision Tree As a short description, decision tree is a data classification procedure which

More information

Learning Decision Trees

Learning Decision Trees Learning Decision Trees Machine Learning Spring 2018 1 This lecture: Learning Decision Trees 1. Representation: What are decision trees? 2. Algorithm: Learning decision trees The ID3 algorithm: A greedy

More information

day month year documentname/initials 1

day month year documentname/initials 1 ECE471-571 Pattern Recognition Lecture 13 Decision Tree Hairong Qi, Gonzalez Family Professor Electrical Engineering and Computer Science University of Tennessee, Knoxville http://www.eecs.utk.edu/faculty/qi

More information

Using Impurity and Depth for Decision Trees Pruning

Using Impurity and Depth for Decision Trees Pruning Using Impurity and epth for ecision Trees Pruning. Fournier and B. Crémilleux GRYC, CNRS - UPRSA 6072 Université de Caen F-14032 Caen Cédex France fournier,cremilleux @info.unicaen.fr Abstract Most pruning

More information

Computing and using the deviance with classification trees

Computing and using the deviance with classification trees Computing and using the deviance with classification trees Gilbert Ritschard Dept of Econometrics, University of Geneva Compstat, Rome, August 2006 Outline 1 Introduction 2 Motivation 3 Deviance for Trees

More information

On Multi-Class Cost-Sensitive Learning

On Multi-Class Cost-Sensitive Learning On Multi-Class Cost-Sensitive Learning Zhi-Hua Zhou, Xu-Ying Liu National Key Laboratory for Novel Software Technology, Nanjing University, Nanjing 210093, China {zhouzh, liuxy}@lamda.nju.edu.cn Abstract

More information

Rule Generation using Decision Trees

Rule Generation using Decision Trees Rule Generation using Decision Trees Dr. Rajni Jain 1. Introduction A DT is a classification scheme which generates a tree and a set of rules, representing the model of different classes, from a given

More information

Data Mining: Concepts and Techniques. (3 rd ed.) Chapter 8. Chapter 8. Classification: Basic Concepts

Data Mining: Concepts and Techniques. (3 rd ed.) Chapter 8. Chapter 8. Classification: Basic Concepts Data Mining: Concepts and Techniques (3 rd ed.) Chapter 8 1 Chapter 8. Classification: Basic Concepts Classification: Basic Concepts Decision Tree Induction Bayes Classification Methods Rule-Based Classification

More information

Data Mining. 3.6 Regression Analysis. Fall Instructor: Dr. Masoud Yaghini. Numeric Prediction

Data Mining. 3.6 Regression Analysis. Fall Instructor: Dr. Masoud Yaghini. Numeric Prediction Data Mining 3.6 Regression Analysis Fall 2008 Instructor: Dr. Masoud Yaghini Outline Introduction Straight-Line Linear Regression Multiple Linear Regression Other Regression Models References Introduction

More information

Generalization Error on Pruning Decision Trees

Generalization Error on Pruning Decision Trees Generalization Error on Pruning Decision Trees Ryan R. Rosario Computer Science 269 Fall 2010 A decision tree is a predictive model that can be used for either classification or regression [3]. Decision

More information

An Empirical Study of Building Compact Ensembles

An Empirical Study of Building Compact Ensembles An Empirical Study of Building Compact Ensembles Huan Liu, Amit Mandvikar, and Jigar Mody Computer Science & Engineering Arizona State University Tempe, AZ 85281 {huan.liu,amitm,jigar.mody}@asu.edu Abstract.

More information

Learning Decision Trees

Learning Decision Trees Learning Decision Trees Machine Learning Fall 2018 Some slides from Tom Mitchell, Dan Roth and others 1 Key issues in machine learning Modeling How to formulate your problem as a machine learning problem?

More information

Development of a Data Mining Methodology using Robust Design

Development of a Data Mining Methodology using Robust Design Development of a Data Mining Methodology using Robust Design Sangmun Shin, Myeonggil Choi, Youngsun Choi, Guo Yi Department of System Management Engineering, Inje University Gimhae, Kyung-Nam 61-749 South

More information

Compact Ensemble Trees for Imbalanced Data

Compact Ensemble Trees for Imbalanced Data Compact Ensemble Trees for Imbalanced Data Yubin Park and Joydeep Ghosh Department of Electrical and Computer Engineering The University of Texas at Austin, Austin, TX-78712, USA Abstract. This paper introduces

More information

CS145: INTRODUCTION TO DATA MINING

CS145: INTRODUCTION TO DATA MINING CS145: INTRODUCTION TO DATA MINING 4: Vector Data: Decision Tree Instructor: Yizhou Sun yzsun@cs.ucla.edu October 10, 2017 Methods to Learn Vector Data Set Data Sequence Data Text Data Classification Clustering

More information

Aijun An and Nick Cercone. Department of Computer Science, University of Waterloo. methods in a context of learning classication rules.

Aijun An and Nick Cercone. Department of Computer Science, University of Waterloo. methods in a context of learning classication rules. Discretization of Continuous Attributes for Learning Classication Rules Aijun An and Nick Cercone Department of Computer Science, University of Waterloo Waterloo, Ontario N2L 3G1 Canada Abstract. We present

More information

Review of Lecture 1. Across records. Within records. Classification, Clustering, Outlier detection. Associations

Review of Lecture 1. Across records. Within records. Classification, Clustering, Outlier detection. Associations Review of Lecture 1 This course is about finding novel actionable patterns in data. We can divide data mining algorithms (and the patterns they find) into five groups Across records Classification, Clustering,

More information

DATA MINING WITH DIFFERENT TYPES OF X-RAY DATA

DATA MINING WITH DIFFERENT TYPES OF X-RAY DATA DATA MINING WITH DIFFERENT TYPES OF X-RAY DATA 315 C. K. Lowe-Ma, A. E. Chen, D. Scholl Physical & Environmental Sciences, Research and Advanced Engineering Ford Motor Company, Dearborn, Michigan, USA

More information

Using HDDT to avoid instances propagation in unbalanced and evolving data streams

Using HDDT to avoid instances propagation in unbalanced and evolving data streams Using HDDT to avoid instances propagation in unbalanced and evolving data streams IJCNN 2014 Andrea Dal Pozzolo, Reid Johnson, Olivier Caelen, Serge Waterschoot, Nitesh V Chawla and Gianluca Bontempi 07/07/2014

More information

Classification and regression trees

Classification and regression trees Classification and regression trees Pierre Geurts p.geurts@ulg.ac.be Last update: 23/09/2015 1 Outline Supervised learning Decision tree representation Decision tree learning Extensions Regression trees

More information

Comparison of Shannon, Renyi and Tsallis Entropy used in Decision Trees

Comparison of Shannon, Renyi and Tsallis Entropy used in Decision Trees Comparison of Shannon, Renyi and Tsallis Entropy used in Decision Trees Tomasz Maszczyk and W lodzis law Duch Department of Informatics, Nicolaus Copernicus University Grudzi adzka 5, 87-100 Toruń, Poland

More information

Holdout and Cross-Validation Methods Overfitting Avoidance

Holdout and Cross-Validation Methods Overfitting Avoidance Holdout and Cross-Validation Methods Overfitting Avoidance Decision Trees Reduce error pruning Cost-complexity pruning Neural Networks Early stopping Adjusting Regularizers via Cross-Validation Nearest

More information

Learning theory. Ensemble methods. Boosting. Boosting: history

Learning theory. Ensemble methods. Boosting. Boosting: history Learning theory Probability distribution P over X {0, 1}; let (X, Y ) P. We get S := {(x i, y i )} n i=1, an iid sample from P. Ensemble methods Goal: Fix ɛ, δ (0, 1). With probability at least 1 δ (over

More information

ENSEMBLES OF DECISION RULES

ENSEMBLES OF DECISION RULES ENSEMBLES OF DECISION RULES Jerzy BŁASZCZYŃSKI, Krzysztof DEMBCZYŃSKI, Wojciech KOTŁOWSKI, Roman SŁOWIŃSKI, Marcin SZELĄG Abstract. In most approaches to ensemble methods, base classifiers are decision

More information

Machine Learning & Data Mining

Machine Learning & Data Mining Group M L D Machine Learning M & Data Mining Chapter 7 Decision Trees Xin-Shun Xu @ SDU School of Computer Science and Technology, Shandong University Top 10 Algorithm in DM #1: C4.5 #2: K-Means #3: SVM

More information

CS 6375 Machine Learning

CS 6375 Machine Learning CS 6375 Machine Learning Decision Trees Instructor: Yang Liu 1 Supervised Classifier X 1 X 2. X M Ref class label 2 1 Three variables: Attribute 1: Hair = {blond, dark} Attribute 2: Height = {tall, short}

More information

Chapter ML:III. III. Decision Trees. Decision Trees Basics Impurity Functions Decision Tree Algorithms Decision Tree Pruning

Chapter ML:III. III. Decision Trees. Decision Trees Basics Impurity Functions Decision Tree Algorithms Decision Tree Pruning Chapter ML:III III. Decision Trees Decision Trees Basics Impurity Functions Decision Tree Algorithms Decision Tree Pruning ML:III-34 Decision Trees STEIN/LETTMANN 2005-2017 Splitting Let t be a leaf node

More information

Top-k Parametrized Boost

Top-k Parametrized Boost Top-k Parametrized Boost Turki Turki 1,4, Muhammad Amimul Ihsan 2, Nouf Turki 3, Jie Zhang 4, Usman Roshan 4 1 King Abdulaziz University P.O. Box 80221, Jeddah 21589, Saudi Arabia tturki@kau.edu.sa 2 Department

More information

the tree till a class assignment is reached

the tree till a class assignment is reached Decision Trees Decision Tree for Playing Tennis Prediction is done by sending the example down Prediction is done by sending the example down the tree till a class assignment is reached Definitions Internal

More information

43400 Serdang Selangor, Malaysia Serdang Selangor, Malaysia 4

43400 Serdang Selangor, Malaysia Serdang Selangor, Malaysia 4 An Extended ID3 Decision Tree Algorithm for Spatial Data Imas Sukaesih Sitanggang # 1, Razali Yaakob #2, Norwati Mustapha #3, Ahmad Ainuddin B Nuruddin *4 # Faculty of Computer Science and Information

More information

ABC random forest for parameter estimation. Jean-Michel Marin

ABC random forest for parameter estimation. Jean-Michel Marin ABC random forest for parameter estimation Jean-Michel Marin Université de Montpellier Institut Montpelliérain Alexander Grothendieck (IMAG) Institut de Biologie Computationnelle (IBC) Labex Numev! joint

More information

Decision Trees / NLP Introduction

Decision Trees / NLP Introduction Decision Trees / NLP Introduction Dr. Kevin Koidl School of Computer Science and Statistic Trinity College Dublin ADAPT Research Centre The ADAPT Centre is funded under the SFI Research Centres Programme

More information

The Quadratic Entropy Approach to Implement the Id3 Decision Tree Algorithm

The Quadratic Entropy Approach to Implement the Id3 Decision Tree Algorithm Journal of Computer Science and Information Technology December 2018, Vol. 6, No. 2, pp. 23-29 ISSN: 2334-2366 (Print), 2334-2374 (Online) Copyright The Author(s). All Rights Reserved. Published by American

More information

Decision Trees Part 1. Rao Vemuri University of California, Davis

Decision Trees Part 1. Rao Vemuri University of California, Davis Decision Trees Part 1 Rao Vemuri University of California, Davis Overview What is a Decision Tree Sample Decision Trees How to Construct a Decision Tree Problems with Decision Trees Classification Vs Regression

More information

Growing a Large Tree

Growing a Large Tree STAT 5703 Fall, 2004 Data Mining Methodology I Decision Tree I Growing a Large Tree Contents 1 A Single Split 2 1.1 Node Impurity.................................. 2 1.2 Computation of i(t)................................

More information

CS6375: Machine Learning Gautam Kunapuli. Decision Trees

CS6375: Machine Learning Gautam Kunapuli. Decision Trees Gautam Kunapuli Example: Restaurant Recommendation Example: Develop a model to recommend restaurants to users depending on their past dining experiences. Here, the features are cost (x ) and the user s

More information

UVA CS 4501: Machine Learning

UVA CS 4501: Machine Learning UVA CS 4501: Machine Learning Lecture 21: Decision Tree / Random Forest / Ensemble Dr. Yanjun Qi University of Virginia Department of Computer Science Where are we? è Five major sections of this course

More information

Lecture 7: DecisionTrees

Lecture 7: DecisionTrees Lecture 7: DecisionTrees What are decision trees? Brief interlude on information theory Decision tree construction Overfitting avoidance Regression trees COMP-652, Lecture 7 - September 28, 2009 1 Recall:

More information

Decision Trees Entropy, Information Gain, Gain Ratio

Decision Trees Entropy, Information Gain, Gain Ratio Changelog: 14 Oct, 30 Oct Decision Trees Entropy, Information Gain, Gain Ratio Lecture 3: Part 2 Outline Entropy Information gain Gain ratio Marina Santini Acknowledgements Slides borrowed and adapted

More information

Statistical Machine Learning from Data

Statistical Machine Learning from Data Samy Bengio Statistical Machine Learning from Data 1 Statistical Machine Learning from Data Ensembles Samy Bengio IDIAP Research Institute, Martigny, Switzerland, and Ecole Polytechnique Fédérale de Lausanne

More information

Multivariate interdependent discretization in discovering the best correlated attribute

Multivariate interdependent discretization in discovering the best correlated attribute Data Mining VI 35 Multivariate interdependent discretization in discovering the best correlated attribute S. Chao & Y. P. Li Faculty of Science and Technology, University of Macau, China Abstract The decision

More information

1. Courses are either tough or boring. 2. Not all courses are boring. 3. Therefore there are tough courses. (Cx, Tx, Bx, )

1. Courses are either tough or boring. 2. Not all courses are boring. 3. Therefore there are tough courses. (Cx, Tx, Bx, ) Logic FOL Syntax FOL Rules (Copi) 1. Courses are either tough or boring. 2. Not all courses are boring. 3. Therefore there are tough courses. (Cx, Tx, Bx, ) Dealing with Time Translate into first-order

More information

Adaptive Crowdsourcing via EM with Prior

Adaptive Crowdsourcing via EM with Prior Adaptive Crowdsourcing via EM with Prior Peter Maginnis and Tanmay Gupta May, 205 In this work, we make two primary contributions: derivation of the EM update for the shifted and rescaled beta prior and

More information

A Bayesian approach to estimate probabilities in classification trees

A Bayesian approach to estimate probabilities in classification trees A Bayesian approach to estimate probabilities in classification trees Andrés Cano and Andrés R. Masegosa and Serafín Moral Department of Computer Science and Artificial Intelligence University of Granada

More information

Random Forests: Finding Quasars

Random Forests: Finding Quasars This is page i Printer: Opaque this Random Forests: Finding Quasars Leo Breiman Michael Last John Rice Department of Statistics University of California, Berkeley 0.1 Introduction The automatic classification

More information

Lecture 3: Decision Trees

Lecture 3: Decision Trees Lecture 3: Decision Trees Cognitive Systems - Machine Learning Part I: Basic Approaches of Concept Learning ID3, Information Gain, Overfitting, Pruning last change November 26, 2014 Ute Schmid (CogSys,

More information

Machine Learning on temporal data

Machine Learning on temporal data Machine Learning on temporal data Classification rees for ime Series Ahlame Douzal (Ahlame.Douzal@imag.fr) AMA, LIG, Université Joseph Fourier Master 2R - MOSIG (2011) Plan ime Series classification approaches

More information

EVALUATING RISK FACTORS OF BEING OBESE, BY USING ID3 ALGORITHM IN WEKA SOFTWARE

EVALUATING RISK FACTORS OF BEING OBESE, BY USING ID3 ALGORITHM IN WEKA SOFTWARE EVALUATING RISK FACTORS OF BEING OBESE, BY USING ID3 ALGORITHM IN WEKA SOFTWARE Msc. Daniela Qendraj (Halidini) Msc. Evgjeni Xhafaj Department of Mathematics, Faculty of Information Technology, University

More information

LBR-Meta: An Efficient Algorithm for Lazy Bayesian Rules

LBR-Meta: An Efficient Algorithm for Lazy Bayesian Rules LBR-Meta: An Efficient Algorithm for Lazy Bayesian Rules Zhipeng Xie School of Computer Science Fudan University 220 Handan Road, Shanghai 200433, PR. China xiezp@fudan.edu.cn Abstract LBR is a highly

More information

Dyadic Classification Trees via Structural Risk Minimization

Dyadic Classification Trees via Structural Risk Minimization Dyadic Classification Trees via Structural Risk Minimization Clayton Scott and Robert Nowak Department of Electrical and Computer Engineering Rice University Houston, TX 77005 cscott,nowak @rice.edu Abstract

More information

Introduction to Machine Learning CMU-10701

Introduction to Machine Learning CMU-10701 Introduction to Machine Learning CMU-10701 23. Decision Trees Barnabás Póczos Contents Decision Trees: Definition + Motivation Algorithm for Learning Decision Trees Entropy, Mutual Information, Information

More information

An Overview of Alternative Rule Evaluation Criteria and Their Use in Separate-and-Conquer Classifiers

An Overview of Alternative Rule Evaluation Criteria and Their Use in Separate-and-Conquer Classifiers An Overview of Alternative Rule Evaluation Criteria and Their Use in Separate-and-Conquer Classifiers Fernando Berzal, Juan-Carlos Cubero, Nicolás Marín, and José-Luis Polo Department of Computer Science

More information

Pattern-Based Decision Tree Construction

Pattern-Based Decision Tree Construction Pattern-Based Decision Tree Construction Dominique Gay, Nazha Selmaoui ERIM - University of New Caledonia BP R4 F-98851 Nouméa cedex, France {dominique.gay, nazha.selmaoui}@univ-nc.nc Jean-François Boulicaut

More information

CS 380: ARTIFICIAL INTELLIGENCE MACHINE LEARNING. Santiago Ontañón

CS 380: ARTIFICIAL INTELLIGENCE MACHINE LEARNING. Santiago Ontañón CS 380: ARTIFICIAL INTELLIGENCE MACHINE LEARNING Santiago Ontañón so367@drexel.edu Summary so far: Rational Agents Problem Solving Systematic Search: Uninformed Informed Local Search Adversarial Search

More information

Data Mining und Maschinelles Lernen

Data Mining und Maschinelles Lernen Data Mining und Maschinelles Lernen Ensemble Methods Bias-Variance Trade-off Basic Idea of Ensembles Bagging Basic Algorithm Bagging with Costs Randomization Random Forests Boosting Stacking Error-Correcting

More information

Selected Algorithms of Machine Learning from Examples

Selected Algorithms of Machine Learning from Examples Fundamenta Informaticae 18 (1993), 193 207 Selected Algorithms of Machine Learning from Examples Jerzy W. GRZYMALA-BUSSE Department of Computer Science, University of Kansas Lawrence, KS 66045, U. S. A.

More information

Chapter 6: Classification

Chapter 6: Classification Chapter 6: Classification 1) Introduction Classification problem, evaluation of classifiers, prediction 2) Bayesian Classifiers Bayes classifier, naive Bayes classifier, applications 3) Linear discriminant

More information

Short Note: Naive Bayes Classifiers and Permanence of Ratios

Short Note: Naive Bayes Classifiers and Permanence of Ratios Short Note: Naive Bayes Classifiers and Permanence of Ratios Julián M. Ortiz (jmo1@ualberta.ca) Department of Civil & Environmental Engineering University of Alberta Abstract The assumption of permanence

More information

Index of Balanced Accuracy: A Performance Measure for Skewed Class Distributions

Index of Balanced Accuracy: A Performance Measure for Skewed Class Distributions Index of Balanced Accuracy: A Performance Measure for Skewed Class Distributions V. García 1,2, R.A. Mollineda 2, and J.S. Sánchez 2 1 Lab. Reconocimiento de Patrones, Instituto Tecnológico de Toluca Av.

More information

Abstract. 1 Introduction. Ian H. Witten Department of Computer Science University of Waikato Hamilton, New Zealand

Abstract. 1 Introduction. Ian H. Witten Department of Computer Science University of Waikato Hamilton, New Zealand Í Ò È ÖÑÙØ Ø ÓÒ Ì Ø ÓÖ ØØÖ ÙØ Ë Ð Ø ÓÒ Ò ÓÒ ÌÖ Eibe Frank Department of Computer Science University of Waikato Hamilton, New Zealand eibe@cs.waikato.ac.nz Ian H. Witten Department of Computer Science University

More information

From statistics to data science. BAE 815 (Fall 2017) Dr. Zifei Liu

From statistics to data science. BAE 815 (Fall 2017) Dr. Zifei Liu From statistics to data science BAE 815 (Fall 2017) Dr. Zifei Liu Zifeiliu@ksu.edu Why? How? What? How much? How many? Individual facts (quantities, characters, or symbols) The Data-Information-Knowledge-Wisdom

More information

Ensembles of Classifiers.

Ensembles of Classifiers. Ensembles of Classifiers www.biostat.wisc.edu/~dpage/cs760/ 1 Goals for the lecture you should understand the following concepts ensemble bootstrap sample bagging boosting random forests error correcting

More information

Unsupervised Classification via Convex Absolute Value Inequalities

Unsupervised Classification via Convex Absolute Value Inequalities Unsupervised Classification via Convex Absolute Value Inequalities Olvi L. Mangasarian Abstract We consider the problem of classifying completely unlabeled data by using convex inequalities that contain

More information

Classification: Decision Trees

Classification: Decision Trees Classification: Decision Trees Outline Top-Down Decision Tree Construction Choosing the Splitting Attribute Information Gain and Gain Ratio 2 DECISION TREE An internal node is a test on an attribute. A

More information

Finite Mixture Model of Bounded Semi-naive Bayesian Networks Classifier

Finite Mixture Model of Bounded Semi-naive Bayesian Networks Classifier Finite Mixture Model of Bounded Semi-naive Bayesian Networks Classifier Kaizhu Huang, Irwin King, and Michael R. Lyu Department of Computer Science and Engineering The Chinese University of Hong Kong Shatin,

More information

Question of the Day. Machine Learning 2D1431. Decision Tree for PlayTennis. Outline. Lecture 4: Decision Tree Learning

Question of the Day. Machine Learning 2D1431. Decision Tree for PlayTennis. Outline. Lecture 4: Decision Tree Learning Question of the Day Machine Learning 2D1431 How can you make the following equation true by drawing only one straight line? 5 + 5 + 5 = 550 Lecture 4: Decision Tree Learning Outline Decision Tree for PlayTennis

More information

Machine Learning Recitation 8 Oct 21, Oznur Tastan

Machine Learning Recitation 8 Oct 21, Oznur Tastan Machine Learning 10601 Recitation 8 Oct 21, 2009 Oznur Tastan Outline Tree representation Brief information theory Learning decision trees Bagging Random forests Decision trees Non linear classifier Easy

More information

Ensemble Methods and Random Forests

Ensemble Methods and Random Forests Ensemble Methods and Random Forests Vaishnavi S May 2017 1 Introduction We have seen various analysis for classification and regression in the course. One of the common methods to reduce the generalization

More information

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Intelligent Data Analysis. Decision Trees

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Intelligent Data Analysis. Decision Trees Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Intelligent Data Analysis Decision Trees Paul Prasse, Niels Landwehr, Tobias Scheffer Decision Trees One of many applications:

More information

Classification Based on Logical Concept Analysis

Classification Based on Logical Concept Analysis Classification Based on Logical Concept Analysis Yan Zhao and Yiyu Yao Department of Computer Science, University of Regina, Regina, Saskatchewan, Canada S4S 0A2 E-mail: {yanzhao, yyao}@cs.uregina.ca Abstract.

More information

Data Mining and Knowledge Discovery: Practice Notes

Data Mining and Knowledge Discovery: Practice Notes Data Mining and Knowledge Discovery: Practice Notes dr. Petra Kralj Novak Petra.Kralj.Novak@ijs.si 7.11.2017 1 Course Prof. Bojan Cestnik Data preparation Prof. Nada Lavrač: Data mining overview Advanced

More information

Statistics and learning: Big Data

Statistics and learning: Big Data Statistics and learning: Big Data Learning Decision Trees and an Introduction to Boosting Sébastien Gadat Toulouse School of Economics February 2017 S. Gadat (TSE) SAD 2013 1 / 30 Keywords Decision trees

More information

P leiades: Subspace Clustering and Evaluation

P leiades: Subspace Clustering and Evaluation P leiades: Subspace Clustering and Evaluation Ira Assent, Emmanuel Müller, Ralph Krieger, Timm Jansen, and Thomas Seidl Data management and exploration group, RWTH Aachen University, Germany {assent,mueller,krieger,jansen,seidl}@cs.rwth-aachen.de

More information

Voting Massive Collections of Bayesian Network Classifiers for Data Streams

Voting Massive Collections of Bayesian Network Classifiers for Data Streams Voting Massive Collections of Bayesian Network Classifiers for Data Streams Remco R. Bouckaert Computer Science Department, University of Waikato, New Zealand remco@cs.waikato.ac.nz Abstract. We present

More information

Machine Learning 2nd Edition

Machine Learning 2nd Edition ETHEM ALPAYDIN, modified by Leonardo Bobadilla and some parts from http://wwwcstauacil/~apartzin/machinelearning/ and wwwcsprincetonedu/courses/archive/fall01/cs302 /notes/11/emppt The MIT Press, 2010

More information

ML techniques. symbolic techniques different types of representation value attribute representation representation of the first order

ML techniques. symbolic techniques different types of representation value attribute representation representation of the first order MACHINE LEARNING Definition 1: Learning is constructing or modifying representations of what is being experienced [Michalski 1986], p. 10 Definition 2: Learning denotes changes in the system That are adaptive

More information

I D I A P. Online Policy Adaptation for Ensemble Classifiers R E S E A R C H R E P O R T. Samy Bengio b. Christos Dimitrakakis a IDIAP RR 03-69

I D I A P. Online Policy Adaptation for Ensemble Classifiers R E S E A R C H R E P O R T. Samy Bengio b. Christos Dimitrakakis a IDIAP RR 03-69 R E S E A R C H R E P O R T Online Policy Adaptation for Ensemble Classifiers Christos Dimitrakakis a IDIAP RR 03-69 Samy Bengio b I D I A P December 2003 D a l l e M o l l e I n s t i t u t e for Perceptual

More information

Theoretical comparison between the Gini Index and Information Gain criteria

Theoretical comparison between the Gini Index and Information Gain criteria Annals of Mathematics and Artificial Intelligence 41: 77 93, 2004. 2004 Kluwer Academic Publishers. Printed in the Netherlands. Theoretical comparison between the Gini Index and Information Gain criteria

More information

Randomized Decision Trees

Randomized Decision Trees Randomized Decision Trees compiled by Alvin Wan from Professor Jitendra Malik s lecture Discrete Variables First, let us consider some terminology. We have primarily been dealing with real-valued data,

More information

Notes on Machine Learning for and

Notes on Machine Learning for and Notes on Machine Learning for 16.410 and 16.413 (Notes adapted from Tom Mitchell and Andrew Moore.) Learning = improving with experience Improve over task T (e.g, Classification, control tasks) with respect

More information

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Decision Trees. Tobias Scheffer

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Decision Trees. Tobias Scheffer Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Decision Trees Tobias Scheffer Decision Trees One of many applications: credit risk Employed longer than 3 months Positive credit

More information

Classification using stochastic ensembles

Classification using stochastic ensembles July 31, 2014 Topics Introduction Topics Classification Application and classfication Classification and Regression Trees Stochastic ensemble methods Our application: USAID Poverty Assessment Tools Topics

More information

Advanced Statistical Methods: Beyond Linear Regression

Advanced Statistical Methods: Beyond Linear Regression Advanced Statistical Methods: Beyond Linear Regression John R. Stevens Utah State University Notes 3. Statistical Methods II Mathematics Educators Worshop 28 March 2009 1 http://www.stat.usu.edu/~jrstevens/pcmi

More information

Decision Tree. Decision Tree Learning. c4.5. Example

Decision Tree. Decision Tree Learning. c4.5. Example Decision ree Decision ree Learning s of systems that learn decision trees: c4., CLS, IDR, ASSISA, ID, CAR, ID. Suitable problems: Instances are described by attribute-value couples he target function has

More information

Decision Trees. Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University. February 5 th, Carlos Guestrin 1

Decision Trees. Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University. February 5 th, Carlos Guestrin 1 Decision Trees Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University February 5 th, 2007 2005-2007 Carlos Guestrin 1 Linear separability A dataset is linearly separable iff 9 a separating

More information

Decision Tree Learning

Decision Tree Learning Decision Tree Learning Goals for the lecture you should understand the following concepts the decision tree representation the standard top-down approach to learning a tree Occam s razor entropy and information

More information

Decision Tree Learning

Decision Tree Learning Decision Tree Learning Goals for the lecture you should understand the following concepts the decision tree representation the standard top-down approach to learning a tree Occam s razor entropy and information

More information

Department of Computer Science, University of Waikato, Hamilton, New Zealand. 2

Department of Computer Science, University of Waikato, Hamilton, New Zealand. 2 Predicting Apple Bruising using Machine Learning G.Holmes 1, S.J.Cunningham 1, B.T. Dela Rue 2 and A.F. Bollen 2 1 Department of Computer Science, University of Waikato, Hamilton, New Zealand. 2 Lincoln

More information

Models, Data, Learning Problems

Models, Data, Learning Problems Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Models, Data, Learning Problems Tobias Scheffer Overview Types of learning problems: Supervised Learning (Classification, Regression,

More information