CSCI 5622 Machine Learning

Similar documents
Lecture 3: Decision Trees

Lecture 3: Decision Trees

Decision Tree Learning

Learning Decision Trees

Classification and Prediction

EECS 349:Machine Learning Bryan Pardo

CS 6375 Machine Learning

Notes on Machine Learning for and

Learning Decision Trees

Introduction. Decision Tree Learning. Outline. Decision Tree 9/7/2017. Decision Tree Definition

Decision Tree Learning Mitchell, Chapter 3. CptS 570 Machine Learning School of EECS Washington State University

the tree till a class assignment is reached

Machine Learning & Data Mining

CS145: INTRODUCTION TO DATA MINING

Machine Learning 2nd Edi7on

brainlinksystem.com $25+ / hr AI Decision Tree Learning Part I Outline Learning 11/9/2010 Carnegie Mellon

Decision Trees.

Decision Tree Learning Lecture 2

Question of the Day. Machine Learning 2D1431. Decision Tree for PlayTennis. Outline. Lecture 4: Decision Tree Learning

Decision Tree Learning

Outline. Training Examples for EnjoySport. 2 lecture slides for textbook Machine Learning, c Tom M. Mitchell, McGraw Hill, 1997

Decision Trees.

Decision-Tree Learning. Chapter 3: Decision Tree Learning. Classification Learning. Decision Tree for PlayTennis

Data Mining Classification: Basic Concepts and Techniques. Lecture Notes for Chapter 3. Introduction to Data Mining, 2nd Edition

Decision Tree Learning

M chi h n i e n L e L arni n n i g Decision Trees Mac a h c i h n i e n e L e L a e r a ni n ng

DECISION TREE LEARNING. [read Chapter 3] [recommended exercises 3.1, 3.4]

Lecture 7 Decision Tree Classifier

Introduction to ML. Two examples of Learners: Naïve Bayesian Classifiers Decision Trees

Chapter 3: Decision Tree Learning

Chapter 3: Decision Tree Learning (part 2)

Supervised Learning! Algorithm Implementations! Inferring Rudimentary Rules and Decision Trees!

Administration. Chapter 3: Decision Tree Learning (part 2) Measuring Entropy. Entropy Function

Lecture 24: Other (Non-linear) Classifiers: Decision Tree Learning, Boosting, and Support Vector Classification Instructor: Prof. Ganesh Ramakrishnan

Data Mining and Knowledge Discovery: Practice Notes

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Intelligent Data Analysis. Decision Trees

C4.5 - pruning decision trees

Decision Tree Learning

Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation

Apprentissage automatique et fouille de données (part 2)

Decision Tree Learning

Holdout and Cross-Validation Methods Overfitting Avoidance

Decision Tree Learning

Decision Trees. Tirgul 5

Typical Supervised Learning Problem Setting

Decision Trees. Lewis Fishgold. (Material in these slides adapted from Ray Mooney's slides on Decision Trees)

Classification: Decision Trees

Machine Learning Recitation 8 Oct 21, Oznur Tastan

Machine Learning

Dan Roth 461C, 3401 Walnut

Decision Tree. Decision Tree Learning. c4.5. Example

CS6375: Machine Learning Gautam Kunapuli. Decision Trees

Decision Trees. CS57300 Data Mining Fall Instructor: Bruno Ribeiro

Decision Trees. Data Science: Jordan Boyd-Graber University of Maryland MARCH 11, Data Science: Jordan Boyd-Graber UMD Decision Trees 1 / 1

Decision Tree Analysis for Classification Problems. Entscheidungsunterstützungssysteme SS 18

Data Mining. CS57300 Purdue University. Bruno Ribeiro. February 8, 2018

Learning Classification Trees. Sargur Srihari

Decision Trees (Cont.)

Classification: Decision Trees

Decision Tree Learning and Inductive Inference

Decision Tree And Random Forest

Induction of Decision Trees

Machine Learning (CS 567) Lecture 3

CSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18

Review of Lecture 1. Across records. Within records. Classification, Clustering, Outlier detection. Associations

Knowledge Discovery and Data Mining

Learning from Observations

Day 3: Classification, logistic regression

Decision Trees. Nicholas Ruozzi University of Texas at Dallas. Based on the slides of Vibhav Gogate and David Sontag

Decision Trees Entropy, Information Gain, Gain Ratio

Tutorial 6. By:Aashmeet Kalra

Supervised Learning via Decision Trees

Decision Trees. Each internal node : an attribute Branch: Outcome of the test Leaf node or terminal node: class label.

Predictive Modeling: Classification. KSE 521 Topic 6 Mun Yi

Decision Trees. Introduction. Some facts about decision trees: They represent data-classification models.

Data classification (II)

CSCE 478/878 Lecture 6: Bayesian Learning

Decision trees COMS 4771

Machine Learning

Lecture 7: DecisionTrees

Decision Trees. Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University. February 5 th, Carlos Guestrin 1

Machine Learning and Data Mining. Decision Trees. Prof. Alexander Ihler

CS 380: ARTIFICIAL INTELLIGENCE MACHINE LEARNING. Santiago Ontañón

Empirical Risk Minimization, Model Selection, and Model Assessment

Chapter ML:III. III. Decision Trees. Decision Trees Basics Impurity Functions Decision Tree Algorithms Decision Tree Pruning

Chapter 14 Combining Models

Decision Tree Learning - ID3

Linear Classifiers: Expressiveness

Tufts COMP 135: Introduction to Machine Learning

Decision Support. Dr. Johan Hagelbäck.

CHAPTER-17. Decision Tree Induction

Classification and regression trees

Machine Learning 3. week

Statistics and learning: Big Data

Classification Using Decision Trees

CSC 411 Lecture 3: Decision Trees

Decision trees. Special Course in Computer and Information Science II. Adam Gyenge Helsinki University of Technology

Introduction to Machine Learning CMU-10701

Machine Learning 2nd Edition

Decision trees. Decision tree induction - Algorithm ID3

Transcription:

CSCI 5622 Machine Learning DATE READ DUE Mon, Aug 31 1, 2 & 3 Wed, Sept 2 3 & 5 Wed, Sept 9 TBA Prelim Proposal www.rodneynielsen.com/teaching/csci5622f09/ Instructor: Rodney Nielsen Assistant Professor Adjunct, CU Dept. of Computer Science Research Assistant Professor, DU, Dept. of Electrical & Computer Engr. Research Scientist, Boulder Language Technologies Aug2409 1

Supervised Learning Given a set of training exs X and corresponding outputs Y We want the function f(x) = y Select a hypothesis space H and learning algorithm Search H for a hypothesis h that approximates f Aug2409 2

Decision Trees Aug2409 3

Are you a skier? Decision Tree Run? Bike? # sports? Hate the cold? Coordinated? Which is closer Breckenridge, Keystone, or Vail? Where is Mary Jane? Where is the Flying Dutchman? Aug2409 4

What we learned Evaluate all attributes at each level Need a principled means of evaluating decisions to ensure generalization Prefer purity Prefer smaller tree Attributes with numerous values must be penalized Tree doesn t generalize very well if leaves have very few exs Test on data not in the training set Early stopping or post pruning to improve generalization Skiers are more coordinated Aug2409 5

DT: Hypothesis Space Disjunction of conjunctions Good Poor Don t Go Fresh Yes Don t Go No Go Yes No Don t Go (wait) (or come back early) Go IF Snow=Good & WantToPassML=Yes OR Snow=Poor OR Snow=Fresh & Smart=Yes Then Don t Go Aug2409 6

DT: Hypothesis Space Unrestricted hypothesis space Aug2409 7

DT: Algorithm Learning: Depthfirst greedy search through the state space Classification: Run through tree according to attribute values x ˆ y = c k Aug2409 8

DT: ID3 Learning Algorithm ID3(trainingData, attributes) If (attributes= ) OR (all trainingdata is in one class) return leaf node predicting the majority class x* best attribute to split on Nd Create decision node splitting on x* attributesleft attributes x* For each possible value, v k, of x* addchild(id3(trainingdata subset with x*=v k, attributesleft)) return Nd Aug2409 9

DT: Attribute Selection Evaluate each attribute Use heuristic choice (generally based on statistics or information theory) x * = argmax x i utility( x ) i x 1 x 2 x 3 Aug2409 10

Entropy Entropy( X) = H( X) = p( y =1)log 2 ( y =1) p( y = 0)log 2 ( y = 0) H( [ N,0] ) = 1.0log 2 ( 1.0) 0.0log 2 0.0 Entropy X K ( ) = p y = c k k= 0 ( )log 2 y = c k ( ) = 0.0; H N 2, N 2 ( ) = 0.5log 2 0.5 ( ) 0.5log 2 ( 0.5) =1.0 H(X) = Entropy(X) [0,N] [N/2,N/2] [N,0] Aug2409 11

DT: Information Gain Decrease in entropy as a result of partitioning the data InfoGain( X, x i ) = Entropy( X) v Values( x i ) X v X Entropy X v ( ) { } ; X v x X x i = v Ex: X=[6,7], H(X)= 6/13 log 2 6/13 7/13 log 2 7/13 = 0.996 InfoGain = 0.996 3 /13(1 log 2 1 0 log 2 0) 10 /13(.4 log 2.4.6 log 2.6) = 0.249 x 1 InfoGain = 0.996 6 /13( 5 /6 log 25 /6 1 /6 log 21 /6) 7 /13( 2 /7 log 22 /7 5 /7 log 25 /7) = 0.231 x 2 0.996 2 /13(1 log 2 1 0 log 2 0) 4 /13( 3 /4 log 23 /4 1 /4 log 21 /4) 7 /13( 2 /7 log 22 /7 5 /7 log 25 /7) = 0.281 x 3 Aug2409 12

DT: ID3 Learning Algorithm ID3(trainingData, attributes) If (attributes= ) OR (all trainingdata is in one class) return leaf node predicting the majority class x* best attribute to split on Nd Create decision node splitting on x* attributesleft attributes x* For each possible value, v k, of x* addchild(id3(trainingdata subset with x*=v k, attributesleft)) return Nd Aug2409 13

DT: Inductive Bias Inductive bias Small vs. Large Trees Occam s Razor Aug2409 14

Admin Textbook: Tom Mitchell. Machine Learning. Project Preliminary proposal due Wed, Sept 9 Send idea sooner if you can Topics Learning types Aug2409 15

DT: Overfitting the Data Grow tree until it perfectly classifies the data? Noise or too few training instances Overfitting: Error(h, trngx) < Error(h, trngx), but Error(h, distx) > Error(h, distx) Aug2409 16

DT: Avoiding Overfitting Avoiding overfitting Early stopping Postpruning Criteria Another dataset (development or validation) Reduced Error Pruning Statistical test Encoding size: Minimum description length Rule postpruning IF Snow=Good & WantToPassML=Yes Aug2409 OR Snow=Poor OR Snow=Fresh & Smart=Yes Then Don t Go 17

DT: ContinuousValued Attrs Discretization On the fly discretization Aug2409 18

DT: Inductive Bias Inductive Bias Disjunction of conjunctions Aug2409 19

DT: Inductive Bias Inductive Bias Disjunction of conjunctions Aug2409 20

DT: Inductive Bias Inductive Bias Disjunction of conjunctions Aug2409 21

DT: Alternative Attr Selection Gain Ratio Split Information ( ) = SplitInfo X,a i GainRatio X,a i v Values( a i ) X v X log 2 ( ) = InfoGain ( X,a i) ( ) SplitInfo X,a i X v X = Entropy( X,V ); versus Entropy( X,C) InfoGain = 0.249 GainRatio = 0.319 x 1 InfoGain = 0.231 GainRatio = 0.231 x 2 InfoGain = 0.281 GainRatio = 0. 198 x 3 Aug2409 22

DT: Missing Attribute Values Estimate missing value Most common value at that node Most common value in same class at that node Probabilistic value assignment Split instance into fractions based on proportion of examples with each value Aug2409 23

DT: Attributes w Different Costs Ex: Expensive test such as a CatScan Divide InfoGain by the cost of the test Aug2409 24

DT: Other Issues Cost of Misclassification Regression Aug2409 25

DT: Key Points Practical Generally a topdown greedy search Unrestricted hypothesis space Inductive bias: Preference for smaller trees Use postpruning to avoid overfitting Numerous ID3 extensions and numerous other algorithms such as CART (Classification and Regression Trees) Aug2409 26