Machine Learning (CS 419/519): M. Allen, 14 Sept. 18 made, in hopes that it will allow us to predict future decisions

Similar documents
} It is non-zero, and maximized given a uniform distribution } Thus, for any distribution possible, we have:

Learning Decision Trees

Decision Trees. None Some Full > No Yes. No Yes. No Yes. No Yes. No Yes. No Yes. No Yes. Patrons? WaitEstimate? Hungry? Alternate?

Learning and Neural Networks

Decision Trees. CS 341 Lectures 8/9 Dan Sheldon

CS 380: ARTIFICIAL INTELLIGENCE MACHINE LEARNING. Santiago Ontañón

Learning from Observations. Chapter 18, Sections 1 3 1

CS 380: ARTIFICIAL INTELLIGENCE

Introduction to Artificial Intelligence. Learning from Oberservations

Statistical Learning. Philipp Koehn. 10 November 2015

EECS 349:Machine Learning Bryan Pardo

From inductive inference to machine learning

Chapter 18. Decision Trees and Ensemble Learning. Recall: Learning Decision Trees

1. Courses are either tough or boring. 2. Not all courses are boring. 3. Therefore there are tough courses. (Cx, Tx, Bx, )

Lecture 3: Decision Trees

Decision Trees. Lewis Fishgold. (Material in these slides adapted from Ray Mooney's slides on Decision Trees)

Learning Decision Trees

the tree till a class assignment is reached

Machine Learning Recitation 8 Oct 21, Oznur Tastan

Learning Decision Trees

Machine Learning 2nd Edi7on

Bayesian learning Probably Approximately Correct Learning

CS6375: Machine Learning Gautam Kunapuli. Decision Trees

Decision Trees. Ruy Luiz Milidiú

Introduction to Machine Learning

CS 6375 Machine Learning

Decision Trees. CS57300 Data Mining Fall Instructor: Bruno Ribeiro

Classification Algorithms

Lecture 3: Decision Trees

Notes on Machine Learning for and

Assignment 1: Probabilistic Reasoning, Maximum Likelihood, Classification

Incremental Stochastic Gradient Descent

Decision Trees. Gavin Brown

Supervised Learning (contd) Decision Trees. Mausam (based on slides by UW-AI faculty)

Decision trees. Special Course in Computer and Information Science II. Adam Gyenge Helsinki University of Technology

Learning from Examples

Data Mining. CS57300 Purdue University. Bruno Ribeiro. February 8, 2018

brainlinksystem.com $25+ / hr AI Decision Tree Learning Part I Outline Learning 11/9/2010 Carnegie Mellon

Decision Tree Learning Lecture 2

CSC 411 Lecture 3: Decision Trees

Imagine we ve got a set of data containing several types, or classes. E.g. information about customers, and class=whether or not they buy anything.

Decision Trees. Each internal node : an attribute Branch: Outcome of the test Leaf node or terminal node: class label.

Decision-Tree Learning. Chapter 3: Decision Tree Learning. Classification Learning. Decision Tree for PlayTennis

Classification: Decision Trees

Classification Algorithms

Decision Trees. Data Science: Jordan Boyd-Graber University of Maryland MARCH 11, Data Science: Jordan Boyd-Graber UMD Decision Trees 1 / 1

Classification and Regression Trees

Decision Trees. Tirgul 5

Decision Trees. CSC411/2515: Machine Learning and Data Mining, Winter 2018 Luke Zettlemoyer, Carlos Guestrin, and Andrew Moore

Introduction. Decision Tree Learning. Outline. Decision Tree 9/7/2017. Decision Tree Definition

Question of the Day. Machine Learning 2D1431. Decision Tree for PlayTennis. Outline. Lecture 4: Decision Tree Learning

Introduction to Machine Learning CMU-10701

Decision Trees.

Decision Tree And Random Forest

Decision Tree Learning Mitchell, Chapter 3. CptS 570 Machine Learning School of EECS Washington State University

CS145: INTRODUCTION TO DATA MINING

Chapter 3: Decision Tree Learning

Statistics and learning: Big Data

Administrative notes. Computational Thinking ct.cs.ubc.ca

Tutorial 6. By:Aashmeet Kalra

Generative v. Discriminative classifiers Intuition

Decision Tree Learning

Decision Trees.

UVA CS 4501: Machine Learning

Real Estate Price Prediction with Regression and Classification CS 229 Autumn 2016 Project Final Report

Decision Trees Part 1. Rao Vemuri University of California, Davis

Decision Tree. Decision Tree Learning. c4.5. Example

MIDTERM SOLUTIONS: FALL 2012 CS 6375 INSTRUCTOR: VIBHAV GOGATE

16.4 Multiattribute Utility Functions

Decision Tree Learning

DECISION TREE LEARNING. [read Chapter 3] [recommended exercises 3.1, 3.4]

Decision Tree Learning

CSCI 5622 Machine Learning

C4.5 - pruning decision trees

Decision Tree Learning - ID3

Learning with multiple models. Boosting.

Empirical Risk Minimization, Model Selection, and Model Assessment

Decision Trees. Robot Image Credit: Viktoriya Sukhanova 123RF.com

Lecture 7 Decision Tree Classifier

Machine Learning 3. week

Lecture 24: Other (Non-linear) Classifiers: Decision Tree Learning, Boosting, and Support Vector Classification Instructor: Prof. Ganesh Ramakrishnan

Outline. Training Examples for EnjoySport. 2 lecture slides for textbook Machine Learning, c Tom M. Mitchell, McGraw Hill, 1997

Name (NetID): (1 Point)

Decision Trees / NLP Introduction

CMPT 310 Artificial Intelligence Survey. Simon Fraser University Summer Instructor: Oliver Schulte

Decision Trees. CSC411/2515: Machine Learning and Data Mining, Winter 2018 Luke Zettlemoyer, Carlos Guestrin, and Andrew Moore

Induction of Decision Trees

Machine Learning & Data Mining

Artificial Intelligence. Topic

Dan Roth 461C, 3401 Walnut

Lecture 7: DecisionTrees

Statistical Machine Learning from Data

2018 CS420, Machine Learning, Lecture 5. Tree Models. Weinan Zhang Shanghai Jiao Tong University

Holdout and Cross-Validation Methods Overfitting Avoidance

Decision Trees. Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University. February 5 th, Carlos Guestrin 1

Machine Learning, Fall 2009: Midterm

Artificial Intelligence Roman Barták

ML techniques. symbolic techniques different types of representation value attribute representation representation of the first order

Data Mining Project. C4.5 Algorithm. Saber Salah. Naji Sami Abduljalil Abdulhak

Knowledge Discovery and Data Mining

Transcription:

Review: Decisions Based on Attributes } raining set: cases where patrons have decided to wait or not, along with the associated attributes for each case Class #05: Mutual Information & Decision rees Machine Learning (CS 49/59): M. Allen, 4 Sept. 8 } We now want to learn a tree that agrees with the decisions already made, in hopes that it will allow us to predict future decisions riday, 4 Sep. 08 Machine Learning (CS 49/59) Review: Decision ree unctions } or the examples given, here is a true tree (one that will lead from the inputs to the same outputs) ne Some ull >60 30-60 0-30 0-0 Bar? Reservation? WaitEstimate? Alternate? ri/sat? Hungry? Alternate? Raining? riday, 4 Sep. 08 Machine Learning (CS 49/59) 3 Decision ree Learning Algorithm function DECISION-REE-LEARNING(examples, attributes, parent examples) returns tree if examples is empty then return PLURALIY-VALUE(parent examples) else if all examples have the same classification then return the classification else if attributes is empty then return PLURALIY-VALUE(examples) else A argmax a attributes IMPORANCE(a, examples) tree anewdecisiontreewithroottesta for each value v k of A do exs {e : e examples and e.a = v k} subtree DECISION-REE-LEARNING(exs, attributes A, examples) add a branch to tree with label (A = v k) and subtree subtree return tree PLURALIY-VALUE(): returns output decision-value for majority of examples IMPORANCE(): rates attributes for their importance in making decisions for the given set of examples (the only actually complex part) riday, 4 Sep. 08 Machine Learning (CS 49/59) 4

} he precise tree we build will depend upon the order in which the algorithm chooses attributes and splits up examples } Suppose we have the following training set of 6 examples, defined by the boolean attributes A, B, C, with outputs as shown: 3 4 5 6 } We will consider two possible orders for the attributes when we build our tree: {A, B, C} and {C, B, A} riday, 4 Sep. 08 Machine Learning (CS 49/59) 5 } Suppose we use the order {A, B, C}: start by dividing up cases based on variable A 3 4 5 6 riday, 4 Sep. 08 Machine Learning (CS 49/59) 6 :, 3: :, 4:, 5:, 6: Here, all Outputs are the same, so we can replace this with a simple leaf node with that value. his is an example of the second base case stopping condition of the recursive algorithm. Each of these is a case for which attribute A has the right value, along with the appropriate Output value for that case. } Order {A, B, C}: next, divide un-decided cases based on variable B 3 4 5 6 riday, 4 Sep. 08 Machine Learning (CS 49/59) 7 B? :, 6: 4:, 5: Again, all Outputs are the same on this branch. } Order {A, B, C}: last, divide un-decided cases based on variable C 3 4 5 6 riday, 4 Sep. 08 Machine Learning (CS 49/59) 8 B? C? 4: 5: w, we can replace the last nodes with the relevant decision Output.

} Order {A, B, C}: the final decision tree for our data-set 3 4 5 6 B? C? } If we reverse the order of attributes and do the same process, we get a different, somewhat larger tree (although both will give same decision results on our set) B? C? C? B? B? {A, B, C} {C, B, A} riday, 4 Sep. 08 Machine Learning (CS 49/59) 9 riday, 4 Sep. 08 Machine Learning (CS 49/59) 0 Choosing Attributes ne Some ull Entropy for Decision rees ype? rench Italian hai Burger } or a binary (yes/no) decision problem, we can treat a training set with p positive examples and n negative examples as if it were a random variable with two values and probabilities: P (P os) = p p + n P (Neg)= n p + n } Intuitively, a good choice of the attribute to use is one that gives us the most information about how output decisions are made } Ideally, it would divide our outputs perfectly, telling us everything we needed to know to make our decision } Often, a single attribute only tells us part of what we need to know, so we prefer those that tell us the most } In the example, Patrons gives us more information than ype, since some values of the first attribute predict decision perfectly, while no values of second do the same riday, 4 Sep. 08 Machine Learning (CS 49/59) } We can then use the definition of entropy to measure the information gained by finding out whether an example is positive or negative: H(Examples) = (P (P os) log P (P os) + P (Neg) log P (Neg)) p = ( p + n log p p + n + n p + n log n p + n ) riday, 4 Sep. 08 Machine Learning (CS 49/59) 3

Information Gain } When we choose an attribute A with d values, we divide our training set into sub-sets E,, E d } Each set E k has its own number of positive and negative examples, p k and n k, and entropy H (E k) } he total remaining entropy after dividing on A is thus: dx p k + n k Remainder(A) = p + n H(E k) k= } And the total information gain (entropy reduction) if we do choose to use A as the dividing-branch variable is: Gain(A) = H(Examples) Remainder(A) riday, 4 Sep. 08 Machine Learning (CS 49/59) 3 Choosing Variables Using the Information Gain ne Some ull ype? rench Italian hai Burger } w we can be precise about how Patrons gives us more information than ype: H(Examples) = ( 6 log = ( log = ( 6 + 6 log + log ) + )=.0 6 ) riday, 4 Sep. 08 Machine Learning (CS 49/59) 4 Choosing Variables Using the Information Gain Choosing Variables Using the Information Gain ype? ype? ne Some ull rench Italian hai Burger ne Some ull rench Italian hai Burger } w we can be precise about how Patrons gives us more information than ype: Gain(P atrons) =H(Examples) hus, since we have: Remainder(P atrons) =.0 ( H(E)+ 4 H(E)+ 6 H(E3)) H(E) = ( 0 log 0 + log )=0 H(E) = ( 4 4 log 4 4 + 0 4 log 0 4 )=0 H(E3) = ( 6 log 6 + 4 6 log 4 6 ) 0.98 Gain(P atrons) =.0 0.98 =0.54 riday, 4 Sep. 08 Machine Learning (CS 49/59) 5 } w we can be precise about how Patrons gives us more information than ype: Gain(ype)=H(Examples) hus, since we have: Remainder(ype) =.0 ( H(E)+ H(E)+ 4 H(E3)+ 4 H(E4)) H(E) =H(E) =H(E3) =H(E4) =.0 Gain(P atrons) =.0.0 =0 And so we would choose to split on P atrons, since: Gain(P atrons) =0.54 > Gain(ype)=0 riday, 4 Sep. 08 Machine Learning (CS 49/59) 6 4

Learning with Information Gain } If we use this information gain concept of information to rate the IMPORANCE of an attribute, and always split based on the one that gives us the greatest gain, we can learn the following, more compact tree for the restaurant example: ne Some ull Hungry? ype? rench Italian hai Burger ri/sat? riday, 4 Sep. 08 Machine Learning (CS 49/59) 7 Performance of Learning Proportion correct on test set 0.9 0.8 0.7 0.6 0.5 0.4 0 0 40 60 80 00 raining set size } If we start with a set of 00 random examples of the restaurant problem, we can see that the accuracy of the learning increases relative to the size of the training set riday, 4 Sep. 08 Machine Learning (CS 49/59) 8 Improving Decision rees } One well-known drawback of decision trees is that they tend to overfit to the training set } hat is, they give very good (often exact) performance on the training set, but don t generalize well to new cases } o improve on this, various randomization steps can be added to generate Decision orests:. Build multiple different decision trees. Given an input case, run it through all of the trees, and return the decision given by the majority of those trees Bootstrapping with Multiple rees } When building our different trees, one way this is done is to build each one using different, randomly chosen subsets of the original training set: } Random subsets may or may not overlap } Each tree is built on its own subset, and learns a decision function only for that subset } Each may thus give different decision outputs for the same input, if that input is not in one or the other subsets (or both) Original raining Set S Subset S Subset S Subset S N ree ree ree N riday, 4 Sep. 08 Machine Learning (CS 49/59) 9 riday, 4 Sep. 08 Machine Learning (CS 49/59) 0 5

Random orests } If some of our features give us most of the information about our data, the prior random process may not be as random as we like, however } he same features may be used, over and over, in all our trees, and they will tend to act the same way, eliminating the variation we are trying to achieve } We can modify the procedure to generate a more random forest of trees, by again splitting into random subsets of examples, but also, when we build the trees, build each one using a random subset of the features, too Original raining Set S Subset S Subset S Subset S N his Week } Information heory & Decision rees } Readings: } Blog post on Information heory (linked from class schedule) } Section 8.3 from Russell & rvig } Office Hours: Wing 0 } Monday/Wednesday/riday, :00 PM :00 PM } uesday/hursday, :30 PM 3:00 PM eatures eatures eatures N ree ree ree N riday, 4 Sep. 08 Machine Learning (CS 49/59) riday, 4 Sep. 08 Machine Learning (CS 49/59) 6