Data Mining and Analysis: Fundamental Concepts and Algorithms
|
|
- Imogen Welch
- 6 years ago
- Views:
Transcription
1 Data Mining and Analysis: Fundamental Concepts and Algorithms dataminingbook.info Mohammed J. Zaki 1 Wagner Meira Jr. 2 1 Department of Computer Science Rensselaer Polytechnic Institute, Troy, NY, USA 2 Department of Computer Science Universidade Federal de Minas Gerais, Belo Horizonte, Brazil Chapter 19: Decision Tree Classifier Zaki & Meira Jr. (RPI and UFMG) Data Mining and Analysis Chapter 19: Decision Tree Classifier 1 / 23
2 Decision Tree Classifier Let the training dataset D = {x i, y i } n i=1 consist of n points in a d-dimensional space, with y i being the class label for point x i. A decision tree classifier is a recursive, partition-based tree model that predicts the class ŷ i for each point x i. Let R denote the data space that encompasses the set of input points D. A decision tree uses an axis-parallel hyperplane to split the data space R into two resulting half-spaces or regions, say R 1 and R 2, which also induces a partition of the input points into D 1 and D 2, respectively. Each of these regions is recursively split via axis-parallel hyperplanes until most of the points belong to the same class. To classify a new test point we have to recursively evaluate which half-space it belongs to until we reach a leaf node in the decision tree, at which point we predict its class as the label of the leaf. Zaki & Meira Jr. (RPI and UFMG) Data Mining and Analysis Chapter 19: Decision Tree Classifier 2 / 23
3 Decision Tree: Recursive Splits h X 2 h 4 R 1 h 0 h 5 R 5 z R 6 h 3 R 3 R 4 R X 1 Zaki & Meira Jr. (RPI and UFMG) Data Mining and Analysis Chapter 19: Decision Tree Classifier 3 / 23
4 Decision Tree Yes X No Yes X X Yes No No c 1 44 c 2 1 R 1 Yes c 1 0 c 2 90 R 2 X Yes No X No c 1 1 c 2 0 R 3 c 1 0 c 2 6 R 4 c 1 5 c 2 0 R 5 c 1 0 c 2 3 R 6 Zaki & Meira Jr. (RPI and UFMG) Data Mining and Analysis Chapter 19: Decision Tree Classifier 4 / 23
5 Decision Trees: Axis-Parallel Hyperplanes A hyperplane h(x) is defined as the set of all points x that satisfy the following equation h(x): w T x+b = 0 where w R d dd is a weight vector that is normal to the hyperplane, and b is the offset of the hyperplane from the origin. A decision tree considers only axis-parallel hyperplanes, that is, the weight vector must be parallel to one of the original dimensions or axes X j : h(x): x j + b = 0 where the choice of the offset b yields different hyperplanes along dimension X j. Zaki & Meira Jr. (RPI and UFMG) Data Mining and Analysis Chapter 19: Decision Tree Classifier 5 / 23
6 Decision Trees: Split Points A hyperplane specifies a decision or split point because it splits the data space R into two half-spaces. All points x such that h(x) 0 are on the hyperplane or to one side of the hyperplane, whereas all points such that h(x) > 0 are on the other side. The split point is written as h(x) 0, i.e. X j v where v = b is some value in the domain of attribute X j. The decision or split point X j v thus splits the input data space R into two regions R Y and R N, which denote the set of all possible points that satisfy the decision and those that do not. Categorical Attributes: For a categorical attribute X j, the split points or decisions are of the X j V, where V dom(x j ), and dom(x j ) denotes the domain for X j. Zaki & Meira Jr. (RPI and UFMG) Data Mining and Analysis Chapter 19: Decision Tree Classifier 6 / 23
7 Decision Trees: Data Partition and Purity Each split of R into R Y and R N also induces a binary partition of the corresponding input data points D. A split point of the form X j v induces the data partition D Y = {x x D, x j v} D N = {x x D, x j > v} The purity of a region R j is the fraction of points with the majority label in D j, that is, { } nji purity(d j ) = max i where n j = D j is the total number of data points in the region R j, and n ji is the number of points in D j with class label c i. n j Zaki & Meira Jr. (RPI and UFMG) Data Mining and Analysis Chapter 19: Decision Tree Classifier 7 / 23
8 Decision Trees to Rules Yes X No Yes X X Yes No No c 1 44 c 2 1 R 1 Yes c 1 0 c 2 90 R 2 X Yes No X No c 1 1 c 2 0 R 3 c 1 0 c 2 6 R 4 c 1 5 c 2 0 R 5 c 1 0 c 2 3 R 6 A tree is a set of decision rules; each comprising the decisions on the path to a leaf: R 3 : If X and X and X 1 4.7, then class is c 1, or R 4 : If X and X and X 1 > 4.7, then class is c 2, or R 1 : If X and X 2 > 2.8, then class is c 1, or R 2 : If X 1 > 5.45 and X , then class is c 2, or R 5: If X 1 > 5.45 and X 2 > 3.45 and X 1 6.5, then class is c 1, or R 6 : If X 1 > 5.45 and X 2 > 3.45 and X 1 > 6.5, then class is c 2 Zaki & Meira Jr. (RPI and UFMG) Data Mining and Analysis Chapter 19: Decision Tree Classifier 8 / 23
9 Decision Tree Algorithm The method takes as input a training dataset D, and two parameters η and π, where η is the leaf size and π the leaf purity threshold. Different split points are evaluated for each attribute in D. Numeric decisions are of the form X j v for some value v in the value range for attribute X j, and categorical decisions are of the form X j V for some subset of values in the domain of X j. The best split point is chosen to partition the data into two subsets, D Y and D N, where D Y corresponds to all points x D that satisfy the split decision, and D N corresponds to all points that do not satisfy the split decision. The decision tree method is then called recursively on D Y and D N. We stop the process if the leaf size drops below η or if the purity is at least π. Zaki & Meira Jr. (RPI and UFMG) Data Mining and Analysis Chapter 19: Decision Tree Classifier 9 / 23
10 Decision Tree Algorithm DECISIONTREE (D, η, π): 1 n D // partition size 2 n i {x j x j D, y j = c i } // size of class c { i ni } 3 purity(d) max i n 4 if n η or purity(d) π then// stopping condition c { arg max ni } 5 ci n // majority class 6 create leaf node, and label it with class c 7 return 8 (split point, score ) (, 0)// initialize best split point 9 foreach (attribute X j ) do 10 if (X j is numeric) then 11 (v, score) EVALUATE-NUMERIC-ATTRIBUTE(D,X j ) 12 if score > score then (split point, score ) (X j v, score) else if (X j is categorical) then (V, score) EVALUATE-CATEGORICAL-ATTRIBUTE(D,X j ) if score > score then (split point, score ) (X j V, score) D Y {x D x satisfies split point } D N {x D x does not satisfy split point } create internal node split point, with two child nodes, D Y and D N DECISIONTREE(D Y ); DECISIONTREE(D N ) Zaki & Meira Jr. (RPI and UFMG) Data Mining and Analysis Chapter 19: Decision Tree Classifier 10 / 23
11 Split Point Evaluation Measures: Entropy Intuitively, we want to select a split point that gives the best separation or discrimination between the different class labels. Entropy measures the amount of disorder or uncertainty in a system. A partition has lower entropy (or low disorder) if it is relatively pure, that is, if most of the points have the same label. On the other hand, a partition has higher entropy (or more disorder) if the class labels are mixed, and there is no majority class as such. The entropy of a set of labeled points D is defined as follows: H(D) = k P(c i D) log 2 P(c i D) i=1 where P(c i D) is the probability of class c i in D, and k is the number of classes. If a region is pure, that is, has points from the same class, then the entropy is zero. On the other hand, if the classes are all mixed up, and each appears with equal probability P(c i D) = 1 k, then the entropy has the highest value, H(D) = log 2 k. Zaki & Meira Jr. (RPI and UFMG) Data Mining and Analysis Chapter 19: Decision Tree Classifier 11 / 23
12 Split Point Evaluation Measures: Entropy Define the split entropy as the weighted entropy of each of the resulting partitions H(D Y, D N ) = n Y n H(D Y)+ n N n H(D N) where n = D is the number of points in D, and n Y = D Y and n N = D N are the number of points in D Y and D N. Define the information gain for a split point as Gain(D, D Y, D N ) = H(D) H(D Y, D N ) The higher the information gain, the more the reduction in entropy, and the better the split point. We score each split point and choose the one that gives the highest information gain. Zaki & Meira Jr. (RPI and UFMG) Data Mining and Analysis Chapter 19: Decision Tree Classifier 12 / 23
13 Split Point Evaluation Measures: Gini Index and CART Measure Gini Index: The Gini index is defined as follows: k G(D) = 1 P(c i D) 2 i=1 If the partition is pure, then Gini index is 0. The weighted Gini index of a split point is as follows: G(D Y, D N ) = n Y n G(D Y)+ n N n G(D N) The lower the Gini index value, the better the split point. CART: The CART measure is CART(D Y, D N ) = 2 n Y n n N n k P(c i D Y ) P(c i D N ) This measure thus prefers a split point that maximizes the difference between the class probability mass function for the two partitions; the higher the CART measure, the better the split point. i=1 Zaki & Meira Jr. (RPI and UFMG) Data Mining and Analysis Chapter 19: Decision Tree Classifier 13 / 23
14 Evaluating Split Points: Numeric Attributes All of the split point evaluation measures depend on the class probability mass function (PMF) for D, namely, P(c i D), and the class PMFs for the resulting partitions D Y and D N, namely P(c i D Y ) and P(c i D N ). We have to evaluate split points of the form X v. We consider only the midpoints between two successive distinct values for X in the sample D. Let {v 1,...,v m } denote the set of all such midpoints, such that v 1 < v 2 < < v m. For each split point X v, we have to estimate the class PMFs: ˆP(c i D Y ) = ˆP(c i X v) ˆP(c i D N ) = ˆP(c i X > v) Using Bayes theorem, we have ˆP(c i X v) = ˆP(X v c i )ˆP(c i ) ˆP(X v) = ˆP(X v c i )ˆP(c i ) k j=1 ˆP(X v c j )ˆP(c j ) Thus we have to estimate the prior probability and likelihood for each class in each partition. Zaki & Meira Jr. (RPI and UFMG) Data Mining and Analysis Chapter 19: Decision Tree Classifier 14 / 23
15 Evaluating Split Points: Numeric Attributes The prior probability for each class in D can be estimated as ˆP(c i ) = 1 n n j=1 I(y j = c i ) = n i n where y j is the class for point x j, n = D is the total number of points, and n i is the number of points in D with class c i. Define N vi as the number of points x j v with class c i, where x j is the value of data point x j for the attribute X, given as N vi = n I(x j v and y j = c i ) j=1 Zaki & Meira Jr. (RPI and UFMG) Data Mining and Analysis Chapter 19: Decision Tree Classifier 15 / 23
16 Evaluating Split Points: Numeric Attributes We can estimate P(X v c i ) and ˆP(X > v c i ) as follows: ˆP(X v c i ) = N vi n i ˆP(X > v c i ) = 1 ˆP(X v c i ) = n i N vi n i Finally, we have ˆP(c i D Y ) = ˆP(c i X v) = N vi k j=1 N vj ˆP(c i D N ) = ˆP(c i X > v) = n i N vi k j=1 (n j N vj ) The total cost of evaluating a numeric attribute is O(n log n+nk), where k is the number of classes, and n is the number of points. Zaki & Meira Jr. (RPI and UFMG) Data Mining and Analysis Chapter 19: Decision Tree Classifier 16 / 23
17 Algorithm EVALUATE-NUMERIC-ATTRIBUTE EVALUATE-NUMERIC-ATTRIBUTE (D, X): sort D on attribute X, so that x j x j+1, j = 1,...,n 1 M // set of midpoints for i = 1,...,k do n i 0 for j = 1,...,n 1 do if y j = c i then n i n i + 1// running count for class c i if x j+1 x j then v x j+1 + x j ; M M {v}// midpoints 2 for i = 1,...,k do N vi n i // Number of points such that x j v and y j = c i if y n = c i then n i n i + 1 v ; score 0// initialize best split point forall v M do for i = 1,...,k do ˆP(c i D Y ) ˆP(c i D N ) N vi kj=1 N vj n i N vi kj=1 n j N vj score(x v) Gain(D, D Y, D N ) if score(x v) > score then v v; score score(x v) 19 return (v, score ) Zaki & Meira Jr. (RPI and UFMG) Data Mining and Analysis Chapter 19: Decision Tree Classifier 17 / 23
18 Iris Data: Class-specific Frequencies N vi Classes c 1 and c 2 for attributesepal length Frequency: Nvi other (c 2 ) v = 5.45 iris-setosa (c 1 ) Midpoints: v Zaki & Meira Jr. (RPI and UFMG) Data Mining and Analysis Chapter 19: Decision Tree Classifier 18 / 23
19 Iris Data: Information Gain for Different Splits Information Gain sepal-width(x 2 ) sepal-length(x 1 ) X Split points: X i v Zaki & Meira Jr. (RPI and UFMG) Data Mining and Analysis Chapter 19: Decision Tree Classifier 19 / 23
20 Categorical Attributes For categorical X the split points are of the form X V, where V dom(x) and V. All distinct partitions of the set of values of X are considered. If m = dom(x), then there are O(2 m 1 ) distinct partitions, which can be too many. One simplification is to restrict V to be of size one, so that there are only m split points of the form X j {v}, where v dom(x j ). Define n vi as the number of points x j D, with value x j = v for attribute X and having class y j = c i : n vi = n I(x j = v and y j = c i ) The class conditional empirical PMF for X is then given as ˆP ( ) X = v and c i ˆP(X = v c i ) = = n vi ˆP(c i ) n i j=1 We then have ˆP(c i D Y ) = v V n vi k j=1 v V n vj ˆP(c i D N ) = v V n vi k j=1 v V n vj Zaki & Meira Jr. (RPI and UFMG) Data Mining and Analysis Chapter 19: Decision Tree Classifier 20 / 23
21 Algorithm EVALUATE-CATEGORICAL-ATTRIBUTE EVALUATE-CATEGORICAL-ATTRIBUTE (D, X, l): for i = 1,...,k do n i 0 forall v dom(x) do n vi 0 for j = 1,...,n do if x j = v and y j = c i then n vi n vi + 1// frequency statistics // evaluate split points of the form X V V ; score 0// initialize best split point forall V dom(x), such that 1 V l do for i = 1,...,k do v V ˆP(c i D Y ) n vi kj=1 ˆP(c i D N ) v V n vj v V n vi kj=1 v V n vj score(x V) Gain(D, D Y, D N ) if score(x V) > score then V V; score score(x V) return (V, score ) Zaki & Meira Jr. (RPI and UFMG) Data Mining and Analysis Chapter 19: Decision Tree Classifier 21 / 23
22 Discretizedsepal length: Class Frequencies Bins v: values Class frequencies (n vi ) c 1 :iris-setosa c 2 :other [4.3, 5.2] Very Short (a 1 ) 39 6 (5.2, 6.1] Short (a 2 ) (6.1, 7.0] Long (a 3 ) 0 43 (7.0, 7.9] Very Long (a 4 ) 0 12 Zaki & Meira Jr. (RPI and UFMG) Data Mining and Analysis Chapter 19: Decision Tree Classifier 22 / 23
23 Categorical Split Points forsepal length Best split: X {a 1 }. V Split entropy Info. gain {a 1 } {a 2 } {a 3 } {a 4 } {a 1, a 2 } {a 1, a 3 } {a 1, a 4 } {a 2, a 3 } {a 2, a 4 } {a 3, a 4 } Zaki & Meira Jr. (RPI and UFMG) Data Mining and Analysis Chapter 19: Decision Tree Classifier 23 / 23
Data Mining and Analysis: Fundamental Concepts and Algorithms
Data Mining and Analysis: Fundamental Concepts and Algorithms dataminingbook.info Mohammed J. Zaki 1 Wagner Meira Jr. 2 1 Department of Computer Science Rensselaer Polytechnic Institute, Troy, NY, USA
More informationFour Paradigms in Data Mining
Four Paradigms in Data Mining dataminingbook.info Wagner Meira Jr. 1 1 Department of Computer Science Universidade Federal de Minas Gerais, Belo Horizonte, Brazil October 13, 2015 Meira Jr. (UFMG) Four
More informationData Mining and Analysis: Fundamental Concepts and Algorithms
Data Mining and Analysis: Fundamental Concepts and Algorithms dataminingbook.info Mohammed J. Zaki 1 Wagner Meira Jr. 2 1 Department of Computer Science Rensselaer Polytechnic Institute, Troy, NY, USA
More informationData Mining and Analysis: Fundamental Concepts and Algorithms
Data Mining and Analysis: Fundamental Concepts and Algorithms dataminingbook.info Mohammed J. Zaki 1 Wagner Meira Jr. 1 Department of Computer Science Rensselaer Polytechnic Institute, Troy, NY, USA Department
More informationData Mining and Analysis: Fundamental Concepts and Algorithms
Data Mining and Analysis: Fundamental Concepts and Algorithms dataminingbook.info Mohammed J. Zaki 1 Wagner Meira Jr. 2 1 Department of Computer Science Rensselaer Polytechnic Institute, Troy, NY, USA
More informationData Mining and Analysis: Fundamental Concepts and Algorithms
: Fundamental Concepts and Algorithms dataminingbook.info Mohammed J. Zaki 1 Wagner Meira Jr. 2 1 Department of Computer Science Rensselaer Polytechnic Institute, Troy, NY, USA 2 Department of Computer
More informationData Mining and Analysis: Fundamental Concepts and Algorithms
Data Mining and Analysis: Fundamental Concepts and Algorithms dataminingbook.info Mohammed J. Zaki 1 Wagner Meira Jr. 2 1 Department of Computer Science Rensselaer Polytechnic Institute, Troy, NY, USA
More informationData Mining and Analysis: Fundamental Concepts and Algorithms
Data Mining and Analysis: Fundamental Concepts and Algorithms dataminingbook.info Mohammed J. Zaki 1 Wagner Meira Jr. 2 1 Department of Computer Science Rensselaer Polytechnic Institute, Troy, NY, USA
More informationData Mining and Analysis: Fundamental Concepts and Algorithms
Data Mining and Analysis: Fundamental Concepts and Algorithms dataminingbook.info Mohammed J. Zaki 1 Wagner Meira Jr. 2 1 Department of Computer Science Rensselaer Polytechnic Institute, Troy, NY, USA
More informationData Mining and Analysis: Fundamental Concepts and Algorithms
Data Mining and Analysis: Fundamental Concepts and Algorithms dataminingbook.info Mohammed J. Zaki 1 Wagner Meira Jr. 2 1 Department of Computer Science Rensselaer Polytechnic Institute, Troy, NY, USA
More informationData Mining and Analysis: Fundamental Concepts and Algorithms
Data Mining and Analysis: Fundamental Concepts and Algorithms dataminingbook.info Mohammed J. Zaki 1 Wagner Meira Jr. 2 1 Department of Computer Science Rensselaer Polytechnic Institute, Troy, NY, USA
More informationData Mining and Analysis: Fundamental Concepts and Algorithms
Data Mining and Analysis: Fundamental Concepts and Algorithms dataminingbook.info Mohammed J. Zaki 1 Wagner Meira Jr. 2 1 Department of Computer Science Rensselaer Polytechnic Institute, Troy, NY, USA
More informationData Mining and Analysis: Fundamental Concepts and Algorithms
Data Mining and Analysis: Fundamental Concepts and Algorithms dataminingbook.info Mohammed J. Zaki Wagner Meira Jr. Department of Computer Science Rensselaer Polytechnic Institute, Troy, NY, USA Department
More informationData Mining and Analysis: Fundamental Concepts and Algorithms
Data Mining and Analysis: Fundamental Concepts and Algorithms dataminingbook.info Mohammed J. Zaki 1 Wagner Meira Jr. 2 1 Department of Computer Science Rensselaer Polytechnic Institute, Troy, NY, USA
More informationData Mining. CS57300 Purdue University. Bruno Ribeiro. February 8, 2018
Data Mining CS57300 Purdue University Bruno Ribeiro February 8, 2018 Decision trees Why Trees? interpretable/intuitive, popular in medical applications because they mimic the way a doctor thinks model
More informationDecision Trees. CS57300 Data Mining Fall Instructor: Bruno Ribeiro
Decision Trees CS57300 Data Mining Fall 2016 Instructor: Bruno Ribeiro Goal } Classification without Models Well, partially without a model } Today: Decision Trees 2015 Bruno Ribeiro 2 3 Why Trees? } interpretable/intuitive,
More informationData Mining Classification: Basic Concepts and Techniques. Lecture Notes for Chapter 3. Introduction to Data Mining, 2nd Edition
Data Mining Classification: Basic Concepts and Techniques Lecture Notes for Chapter 3 by Tan, Steinbach, Karpatne, Kumar 1 Classification: Definition Given a collection of records (training set ) Each
More informationDecision trees COMS 4771
Decision trees COMS 4771 1. Prediction functions (again) Learning prediction functions IID model for supervised learning: (X 1, Y 1),..., (X n, Y n), (X, Y ) are iid random pairs (i.e., labeled examples).
More informationMachine Learning and Data Mining. Decision Trees. Prof. Alexander Ihler
+ Machine Learning and Data Mining Decision Trees Prof. Alexander Ihler Decision trees Func-onal form f(x;µ): nested if-then-else statements Discrete features: fully expressive (any func-on) Structure:
More informationLecture 7 Decision Tree Classifier
Machine Learning Dr.Ammar Mohammed Lecture 7 Decision Tree Classifier Decision Tree A decision tree is a simple classifier in the form of a hierarchical tree structure, which performs supervised classification
More informationCS145: INTRODUCTION TO DATA MINING
CS145: INTRODUCTION TO DATA MINING 4: Vector Data: Decision Tree Instructor: Yizhou Sun yzsun@cs.ucla.edu October 10, 2017 Methods to Learn Vector Data Set Data Sequence Data Text Data Classification Clustering
More informationLecture 7: DecisionTrees
Lecture 7: DecisionTrees What are decision trees? Brief interlude on information theory Decision tree construction Overfitting avoidance Regression trees COMP-652, Lecture 7 - September 28, 2009 1 Recall:
More informationData Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation
Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation Lecture Notes for Chapter 4 Part I Introduction to Data Mining by Tan, Steinbach, Kumar Adapted by Qiang Yang (2010) Tan,Steinbach,
More informationGenerative v. Discriminative classifiers Intuition
Logistic Regression (Continued) Generative v. Discriminative Decision rees Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University January 31 st, 2007 2005-2007 Carlos Guestrin 1 Generative
More informationIntroduction to Data Science Data Mining for Business Analytics
Introduction to Data Science Data Mining for Business Analytics BRIAN D ALESSANDRO VP DATA SCIENCE, DSTILLERY ADJUNCT PROFESSOR, NYU FALL 2014 Fine Print: these slides are, and always will be a work in
More informationDecision Trees. Gavin Brown
Decision Trees Gavin Brown Every Learning Method has Limitations Linear model? KNN? SVM? Explain your decisions Sometimes we need interpretable results from our techniques. How do you explain the above
More informationLearning Decision Trees
Learning Decision Trees Machine Learning Spring 2018 1 This lecture: Learning Decision Trees 1. Representation: What are decision trees? 2. Algorithm: Learning decision trees The ID3 algorithm: A greedy
More informationDecision Tree Analysis for Classification Problems. Entscheidungsunterstützungssysteme SS 18
Decision Tree Analysis for Classification Problems Entscheidungsunterstützungssysteme SS 18 Supervised segmentation An intuitive way of thinking about extracting patterns from data in a supervised manner
More information26 Chapter 4 Classification
26 Chapter 4 Classification The preceding tree cannot be simplified. 2. Consider the training examples shown in Table 4.1 for a binary classification problem. Table 4.1. Data set for Exercise 2. Customer
More informationInformal Definition: Telling things apart
9. Decision Trees Informal Definition: Telling things apart 2 Nominal data No numeric feature vector Just a list or properties: Banana: longish, yellow Apple: round, medium sized, different colors like
More informationTufts COMP 135: Introduction to Machine Learning
Tufts COMP 135: Introduction to Machine Learning https://www.cs.tufts.edu/comp/135/2019s/ Logistic Regression Many slides attributable to: Prof. Mike Hughes Erik Sudderth (UCI) Finale Doshi-Velez (Harvard)
More informationDecision Trees. Nicholas Ruozzi University of Texas at Dallas. Based on the slides of Vibhav Gogate and David Sontag
Decision Trees Nicholas Ruozzi University of Texas at Dallas Based on the slides of Vibhav Gogate and David Sontag Supervised Learning Input: labelled training data i.e., data plus desired output Assumption:
More informationDecision Tree And Random Forest
Decision Tree And Random Forest Dr. Ammar Mohammed Associate Professor of Computer Science ISSR, Cairo University PhD of CS ( Uni. Koblenz-Landau, Germany) Spring 2019 Contact: mailto: Ammar@cu.edu.eg
More informationPredictive Modeling: Classification. KSE 521 Topic 6 Mun Yi
Predictive Modeling: Classification Topic 6 Mun Yi Agenda Models and Induction Entropy and Information Gain Tree-Based Classifier Probability Estimation 2 Introduction Key concept of BI: Predictive modeling
More informationLecture VII: Classification I. Dr. Ouiem Bchir
Lecture VII: Classification I Dr. Ouiem Bchir 1 Classification: Definition Given a collection of records (training set ) Each record contains a set of attributes, one of the attributes is the class. Find
More informationthe tree till a class assignment is reached
Decision Trees Decision Tree for Playing Tennis Prediction is done by sending the example down Prediction is done by sending the example down the tree till a class assignment is reached Definitions Internal
More informationCS 6375 Machine Learning
CS 6375 Machine Learning Decision Trees Instructor: Yang Liu 1 Supervised Classifier X 1 X 2. X M Ref class label 2 1 Three variables: Attribute 1: Hair = {blond, dark} Attribute 2: Height = {tall, short}
More informationDecision Trees Part 1. Rao Vemuri University of California, Davis
Decision Trees Part 1 Rao Vemuri University of California, Davis Overview What is a Decision Tree Sample Decision Trees How to Construct a Decision Tree Problems with Decision Trees Classification Vs Regression
More informationMachine Learning 3. week
Machine Learning 3. week Entropy Decision Trees ID3 C4.5 Classification and Regression Trees (CART) 1 What is Decision Tree As a short description, decision tree is a data classification procedure which
More informationChapter ML:III. III. Decision Trees. Decision Trees Basics Impurity Functions Decision Tree Algorithms Decision Tree Pruning
Chapter ML:III III. Decision Trees Decision Trees Basics Impurity Functions Decision Tree Algorithms Decision Tree Pruning ML:III-34 Decision Trees STEIN/LETTMANN 2005-2017 Splitting Let t be a leaf node
More informationUVA CS 4501: Machine Learning
UVA CS 4501: Machine Learning Lecture 21: Decision Tree / Random Forest / Ensemble Dr. Yanjun Qi University of Virginia Department of Computer Science Where are we? è Five major sections of this course
More informationMachine Learning Recitation 8 Oct 21, Oznur Tastan
Machine Learning 10601 Recitation 8 Oct 21, 2009 Oznur Tastan Outline Tree representation Brief information theory Learning decision trees Bagging Random forests Decision trees Non linear classifier Easy
More informationDecision Trees. Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University. February 5 th, Carlos Guestrin 1
Decision Trees Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University February 5 th, 2007 2005-2007 Carlos Guestrin 1 Linear separability A dataset is linearly separable iff 9 a separating
More informationDecision Tree Learning Lecture 2
Machine Learning Coms-4771 Decision Tree Learning Lecture 2 January 28, 2008 Two Types of Supervised Learning Problems (recap) Feature (input) space X, label (output) space Y. Unknown distribution D over
More informationLearning Decision Trees
Learning Decision Trees CS194-10 Fall 2011 Lecture 8 CS194-10 Fall 2011 Lecture 8 1 Outline Decision tree models Tree construction Tree pruning Continuous input features CS194-10 Fall 2011 Lecture 8 2
More informationData classification (II)
Lecture 4: Data classification (II) Data Mining - Lecture 4 (2016) 1 Outline Decision trees Choice of the splitting attribute ID3 C4.5 Classification rules Covering algorithms Naïve Bayes Classification
More informationMachine Learning & Data Mining
Group M L D Machine Learning M & Data Mining Chapter 7 Decision Trees Xin-Shun Xu @ SDU School of Computer Science and Technology, Shandong University Top 10 Algorithm in DM #1: C4.5 #2: K-Means #3: SVM
More informationMachine Learning, Midterm Exam: Spring 2009 SOLUTION
10-601 Machine Learning, Midterm Exam: Spring 2009 SOLUTION March 4, 2009 Please put your name at the top of the table below. If you need more room to work out your answer to a question, use the back of
More informationCS6375: Machine Learning Gautam Kunapuli. Decision Trees
Gautam Kunapuli Example: Restaurant Recommendation Example: Develop a model to recommend restaurants to users depending on their past dining experiences. Here, the features are cost (x ) and the user s
More informationClassification: Decision Trees
Classification: Decision Trees Outline Top-Down Decision Tree Construction Choosing the Splitting Attribute Information Gain and Gain Ratio 2 DECISION TREE An internal node is a test on an attribute. A
More informationLearning Classification Trees. Sargur Srihari
Learning Classification Trees Sargur srihari@cedar.buffalo.edu 1 Topics in CART CART as an adaptive basis function model Classification and Regression Tree Basics Growing a Tree 2 A Classification Tree
More informationLearning Decision Trees
Learning Decision Trees Machine Learning Fall 2018 Some slides from Tom Mitchell, Dan Roth and others 1 Key issues in machine learning Modeling How to formulate your problem as a machine learning problem?
More informationUniversität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Decision Trees. Tobias Scheffer
Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Decision Trees Tobias Scheffer Decision Trees One of many applications: credit risk Employed longer than 3 months Positive credit
More informationIntroduction to Machine Learning Spring 2018 Note 18
CS 189 Introduction to Machine Learning Spring 2018 Note 18 1 Gaussian Discriminant Analysis Recall the idea of generative models: we classify an arbitrary datapoint x with the class label that maximizes
More informationSupervised Learning! Algorithm Implementations! Inferring Rudimentary Rules and Decision Trees!
Supervised Learning! Algorithm Implementations! Inferring Rudimentary Rules and Decision Trees! Summary! Input Knowledge representation! Preparing data for learning! Input: Concept, Instances, Attributes"
More informationBINARY TREE-STRUCTURED PARTITION AND CLASSIFICATION SCHEMES
BINARY TREE-STRUCTURED PARTITION AND CLASSIFICATION SCHEMES DAVID MCDIARMID Abstract Binary tree-structured partition and classification schemes are a class of nonparametric tree-based approaches to classification
More informationCSE 151 Machine Learning. Instructor: Kamalika Chaudhuri
CSE 151 Machine Learning Instructor: Kamalika Chaudhuri Announcements Midterm is graded! Average: 39, stdev: 6 HW2 is out today HW2 is due Thursday, May 3, by 5pm in my mailbox Decision Tree Classifiers
More informationClassification Using Decision Trees
Classification Using Decision Trees 1. Introduction Data mining term is mainly used for the specific set of six activities namely Classification, Estimation, Prediction, Affinity grouping or Association
More informationDan Roth 461C, 3401 Walnut
CIS 519/419 Applied Machine Learning www.seas.upenn.edu/~cis519 Dan Roth danroth@seas.upenn.edu http://www.cis.upenn.edu/~danroth/ 461C, 3401 Walnut Slides were created by Dan Roth (for CIS519/419 at Penn
More informationDecision T ree Tree Algorithm Week 4 1
Decision Tree Algorithm Week 4 1 Team Homework Assignment #5 Read pp. 105 117 of the text book. Do Examples 3.1, 3.2, 3.3 and Exercise 3.4 (a). Prepare for the results of the homework assignment. Due date
More informationContents Lecture 4. Lecture 4 Linear Discriminant Analysis. Summary of Lecture 3 (II/II) Summary of Lecture 3 (I/II)
Contents Lecture Lecture Linear Discriminant Analysis Fredrik Lindsten Division of Systems and Control Department of Information Technology Uppsala University Email: fredriklindsten@ituuse Summary of lecture
More information2018 CS420, Machine Learning, Lecture 5. Tree Models. Weinan Zhang Shanghai Jiao Tong University
2018 CS420, Machine Learning, Lecture 5 Tree Models Weinan Zhang Shanghai Jiao Tong University http://wnzhang.net http://wnzhang.net/teaching/cs420/index.html ML Task: Function Approximation Problem setting
More informationA Decision Stump. Decision Trees, cont. Boosting. Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University. October 1 st, 2007
Decision Trees, cont. Boosting Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University October 1 st, 2007 1 A Decision Stump 2 1 The final tree 3 Basic Decision Tree Building Summarized
More informationDecision Tree Learning and Inductive Inference
Decision Tree Learning and Inductive Inference 1 Widely used method for inductive inference Inductive Inference Hypothesis: Any hypothesis found to approximate the target function well over a sufficiently
More informationInduction of Decision Trees
Induction of Decision Trees Peter Waiganjo Wagacha This notes are for ICS320 Foundations of Learning and Adaptive Systems Institute of Computer Science University of Nairobi PO Box 30197, 00200 Nairobi.
More informationName (NetID): (1 Point)
CS446: Machine Learning Fall 2016 October 25 th, 2016 This is a closed book exam. Everything you need in order to solve the problems is supplied in the body of this exam. This exam booklet contains four
More informationLecture 4: Data preprocessing: Data Reduction-Discretization. Dr. Edgar Acuna. University of Puerto Rico- Mayaguez math.uprm.
COMP 6838: Data Mining Lecture 4: Data preprocessing: Data Reduction-Discretization Dr. Edgar Acuna Department t of Mathematics ti University of Puerto Rico- Mayaguez math.uprm.edu/~edgar 1 Discretization
More informationReview of Lecture 1. Across records. Within records. Classification, Clustering, Outlier detection. Associations
Review of Lecture 1 This course is about finding novel actionable patterns in data. We can divide data mining algorithms (and the patterns they find) into five groups Across records Classification, Clustering,
More informationText Mining. Dr. Yanjun Li. Associate Professor. Department of Computer and Information Sciences Fordham University
Text Mining Dr. Yanjun Li Associate Professor Department of Computer and Information Sciences Fordham University Outline Introduction: Data Mining Part One: Text Mining Part Two: Preprocessing Text Data
More informationRule Generation using Decision Trees
Rule Generation using Decision Trees Dr. Rajni Jain 1. Introduction A DT is a classification scheme which generates a tree and a set of rules, representing the model of different classes, from a given
More informationEXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING
EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING DATE AND TIME: June 9, 2018, 09.00 14.00 RESPONSIBLE TEACHER: Andreas Svensson NUMBER OF PROBLEMS: 5 AIDING MATERIAL: Calculator, mathematical
More informationStatistics and learning: Big Data
Statistics and learning: Big Data Learning Decision Trees and an Introduction to Boosting Sébastien Gadat Toulouse School of Economics February 2017 S. Gadat (TSE) SAD 2013 1 / 30 Keywords Decision trees
More informationCHAPTER-17. Decision Tree Induction
CHAPTER-17 Decision Tree Induction 17.1 Introduction 17.2 Attribute selection measure 17.3 Tree Pruning 17.4 Extracting Classification Rules from Decision Trees 17.5 Bayesian Classification 17.6 Bayes
More informationDecision Trees. Each internal node : an attribute Branch: Outcome of the test Leaf node or terminal node: class label.
Decision Trees Supervised approach Used for Classification (Categorical values) or regression (continuous values). The learning of decision trees is from class-labeled training tuples. Flowchart like structure.
More informationNotes on Machine Learning for and
Notes on Machine Learning for 16.410 and 16.413 (Notes adapted from Tom Mitchell and Andrew Moore.) Learning = improving with experience Improve over task T (e.g, Classification, control tasks) with respect
More information15-381: Artificial Intelligence. Decision trees
15-381: Artificial Intelligence Decision trees Bayes classifiers find the label that maximizes: Naïve Bayes models assume independence of the features given the label leading to the following over documents
More informationLinear classifiers: Logistic regression
Linear classifiers: Logistic regression STAT/CSE 416: Machine Learning Emily Fox University of Washington April 19, 2018 How confident is your prediction? The sushi & everything else were awesome! The
More informationMidterm, Fall 2003
5-78 Midterm, Fall 2003 YOUR ANDREW USERID IN CAPITAL LETTERS: YOUR NAME: There are 9 questions. The ninth may be more time-consuming and is worth only three points, so do not attempt 9 unless you are
More informationSupervised Learning via Decision Trees
Supervised Learning via Decision Trees Lecture 4 1 Outline 1. Learning via feature splits 2. ID3 Information gain 3. Extensions Continuous features Gain ratio Ensemble learning 2 Sequence of decisions
More informationGeneralization to Multi-Class and Continuous Responses. STA Data Mining I
Generalization to Multi-Class and Continuous Responses STA 5703 - Data Mining I 1. Categorical Responses (a) Splitting Criterion Outline Goodness-of-split Criterion Chi-square Tests and Twoing Rule (b)
More information1 Handling of Continuous Attributes in C4.5. Algorithm
.. Spring 2009 CSC 466: Knowledge Discovery from Data Alexander Dekhtyar.. Data Mining: Classification/Supervised Learning Potpourri Contents 1. C4.5. and continuous attributes: incorporating continuous
More informationLecture 3: Decision Trees
Lecture 3: Decision Trees Cognitive Systems II - Machine Learning SS 2005 Part I: Basic Approaches of Concept Learning ID3, Information Gain, Overfitting, Pruning Lecture 3: Decision Trees p. Decision
More informationIntroduction to Machine Learning CMU-10701
Introduction to Machine Learning CMU-10701 23. Decision Trees Barnabás Póczos Contents Decision Trees: Definition + Motivation Algorithm for Learning Decision Trees Entropy, Mutual Information, Information
More informationDecision Trees (Cont.)
Decision Trees (Cont.) R&N Chapter 18.2,18.3 Side example with discrete (categorical) attributes: Predicting age (3 values: less than 30, 30-45, more than 45 yrs old) from census data. Attributes (split
More informationDecision Tree Learning
Decision Tree Learning Goals for the lecture you should understand the following concepts the decision tree representation the standard top-down approach to learning a tree Occam s razor entropy and information
More informationChapter 6: Classification
Chapter 6: Classification 1) Introduction Classification problem, evaluation of classifiers, prediction 2) Bayesian Classifiers Bayes classifier, naive Bayes classifier, applications 3) Linear discriminant
More informationClassification. Department Biosysteme Karsten Borgwardt Data Mining Course Basel Fall Semester / 162
Classification Department Biosysteme Karsten Borgwardt Data Mining Course Basel Fall Semester 2015 66 / 162 Department Biosysteme Karsten Borgwardt Data Mining Course Basel Fall Semester 2015 67 / 162
More informationRandomized Decision Trees
Randomized Decision Trees compiled by Alvin Wan from Professor Jitendra Malik s lecture Discrete Variables First, let us consider some terminology. We have primarily been dealing with real-valued data,
More informationEmpirical Risk Minimization, Model Selection, and Model Assessment
Empirical Risk Minimization, Model Selection, and Model Assessment CS6780 Advanced Machine Learning Spring 2015 Thorsten Joachims Cornell University Reading: Murphy 5.7-5.7.2.4, 6.5-6.5.3.1 Dietterich,
More informationMachine Learning on temporal data
Machine Learning on temporal data Classification rees for ime Series Ahlame Douzal (Ahlame.Douzal@imag.fr) AMA, LIG, Université Joseph Fourier Master 2R - MOSIG (2011) Plan ime Series classification approaches
More information.. Cal Poly CSC 466: Knowledge Discovery from Data Alexander Dekhtyar.. for each element of the dataset we are given its class label.
.. Cal Poly CSC 466: Knowledge Discovery from Data Alexander Dekhtyar.. Data Mining: Classification/Supervised Learning Definitions Data. Consider a set A = {A 1,...,A n } of attributes, and an additional
More informationDecision Trees. Lewis Fishgold. (Material in these slides adapted from Ray Mooney's slides on Decision Trees)
Decision Trees Lewis Fishgold (Material in these slides adapted from Ray Mooney's slides on Decision Trees) Classification using Decision Trees Nodes test features, there is one branch for each value of
More informationDecision Trees. Machine Learning CSEP546 Carlos Guestrin University of Washington. February 3, 2014
Decision Trees Machine Learning CSEP546 Carlos Guestrin University of Washington February 3, 2014 17 Linear separability n A dataset is linearly separable iff there exists a separating hyperplane: Exists
More informationData Mining: Concepts and Techniques. (3 rd ed.) Chapter 8. Chapter 8. Classification: Basic Concepts
Data Mining: Concepts and Techniques (3 rd ed.) Chapter 8 Chapter 8. Classification: Basic Concepts Classification: Basic Concepts Decision Tree Induction Bayes Classification Methods Rule-Based Classification
More informationHarrison B. Prosper. Bari Lectures
Harrison B. Prosper Florida State University Bari Lectures 30, 31 May, 1 June 2016 Lectures on Multivariate Methods Harrison B. Prosper Bari, 2016 1 h Lecture 1 h Introduction h Classification h Grid Searches
More informationClassification Methods II: Linear and Quadratic Discrimminant Analysis
Classification Methods II: Linear and Quadratic Discrimminant Analysis Rebecca C. Steorts, Duke University STA 325, Chapter 4 ISL Agenda Linear Discrimminant Analysis (LDA) Classification Recall that linear
More informationQ1 (12 points): Chap 4 Exercise 3 (a) to (f) (2 points each)
Q1 (1 points): Chap 4 Exercise 3 (a) to (f) ( points each) Given a table Table 1 Dataset for Exercise 3 Instance a 1 a a 3 Target Class 1 T T 1.0 + T T 6.0 + 3 T F 5.0-4 F F 4.0 + 5 F T 7.0-6 F T 3.0-7
More informationJeffrey D. Ullman Stanford University
Jeffrey D. Ullman Stanford University 3 We are given a set of training examples, consisting of input-output pairs (x,y), where: 1. x is an item of the type we want to evaluate. 2. y is the value of some
More informationLecture 24: Other (Non-linear) Classifiers: Decision Tree Learning, Boosting, and Support Vector Classification Instructor: Prof. Ganesh Ramakrishnan
Lecture 24: Other (Non-linear) Classifiers: Decision Tree Learning, Boosting, and Support Vector Classification Instructor: Prof Ganesh Ramakrishnan October 20, 2016 1 / 25 Decision Trees: Cascade of step
More informationMachine Learning 2nd Edi7on
Lecture Slides for INTRODUCTION TO Machine Learning 2nd Edi7on CHAPTER 9: Decision Trees ETHEM ALPAYDIN The MIT Press, 2010 Edited and expanded for CS 4641 by Chris Simpkins alpaydin@boun.edu.tr h1p://www.cmpe.boun.edu.tr/~ethem/i2ml2e
More information