Artificial Intelligence. Topic

Similar documents
Learning Classification Trees. Sargur Srihari

Decision Tree Learning and Inductive Inference

Learning Decision Trees

Learning Decision Trees

Introduction. Decision Tree Learning. Outline. Decision Tree 9/7/2017. Decision Tree Definition

Imagine we ve got a set of data containing several types, or classes. E.g. information about customers, and class=whether or not they buy anything.

Decision Trees / NLP Introduction

Decision Tree Learning - ID3

Decision Trees. Tirgul 5

Dan Roth 461C, 3401 Walnut

Decision Tree Learning Mitchell, Chapter 3. CptS 570 Machine Learning School of EECS Washington State University

Decision Tree Learning

Typical Supervised Learning Problem Setting

Machine Learning Recitation 8 Oct 21, Oznur Tastan

Machine Learning 2nd Edi7on

ARTIFICIAL INTELLIGENCE. Supervised learning: classification

Decision Trees. Danushka Bollegala

Lecture 3: Decision Trees

Rule Generation using Decision Trees

CS 6375 Machine Learning

Decision Tree Learning

Decision Trees. Data Science: Jordan Boyd-Graber University of Maryland MARCH 11, Data Science: Jordan Boyd-Graber UMD Decision Trees 1 / 1

Decision Trees. Gavin Brown

Decision-Tree Learning. Chapter 3: Decision Tree Learning. Classification Learning. Decision Tree for PlayTennis

Lecture 3: Decision Trees

Classification Using Decision Trees

The Naïve Bayes Classifier. Machine Learning Fall 2017

Decision Trees Part 1. Rao Vemuri University of California, Davis

Chapter 3: Decision Tree Learning

Decision Tree Learning

CS6375: Machine Learning Gautam Kunapuli. Decision Trees

Question of the Day. Machine Learning 2D1431. Decision Tree for PlayTennis. Outline. Lecture 4: Decision Tree Learning

Supervised Learning! Algorithm Implementations! Inferring Rudimentary Rules and Decision Trees!

Classification and regression trees

Introduction to ML. Two examples of Learners: Naïve Bayesian Classifiers Decision Trees

Machine Learning in Bioinformatics

the tree till a class assignment is reached

Decision trees. Special Course in Computer and Information Science II. Adam Gyenge Helsinki University of Technology

EECS 349:Machine Learning Bryan Pardo

Data classification (II)

Administration. Chapter 3: Decision Tree Learning (part 2) Measuring Entropy. Entropy Function

Decision Trees.

Decision Trees. Common applications: Health diagnosis systems Bank credit analysis

Decision Tree Learning

( D) I(2,3) I(4,0) I(3,2) weighted avg. of entropies

Bayesian Classification. Bayesian Classification: Why?

CS 380: ARTIFICIAL INTELLIGENCE MACHINE LEARNING. Santiago Ontañón

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Intelligent Data Analysis. Decision Trees

Machine Learning 3. week

Decision Trees.

DECISION TREE LEARNING. [read Chapter 3] [recommended exercises 3.1, 3.4]

Bayesian Learning. Artificial Intelligence Programming. 15-0: Learning vs. Deduction

The Quadratic Entropy Approach to Implement the Id3 Decision Tree Algorithm

Administrative notes. Computational Thinking ct.cs.ubc.ca

M chi h n i e n L e L arni n n i g Decision Trees Mac a h c i h n i e n e L e L a e r a ni n ng

Outline. Training Examples for EnjoySport. 2 lecture slides for textbook Machine Learning, c Tom M. Mitchell, McGraw Hill, 1997

Classification and Prediction

Classification and Regression Trees

Leveraging Randomness in Structure to Enable Efficient Distributed Data Analytics

Bayesian Learning. Reading: Tom Mitchell, Generative and discriminative classifiers: Naive Bayes and logistic regression, Sections 1-2.

Decision Trees. Each internal node : an attribute Branch: Outcome of the test Leaf node or terminal node: class label.

Decision Tree And Random Forest

Decision Support. Dr. Johan Hagelbäck.

Symbolic methods in TC: Decision Trees

Machine Learning Alternatives to Manual Knowledge Acquisition

Machine Learning & Data Mining

UVA CS 4501: Machine Learning

Decision Tree Analysis for Classification Problems. Entscheidungsunterstützungssysteme SS 18

Artificial Intelligence Decision Trees

Lecture 24: Other (Non-linear) Classifiers: Decision Tree Learning, Boosting, and Support Vector Classification Instructor: Prof. Ganesh Ramakrishnan


Classification: Decision Trees

10-701/ Machine Learning: Assignment 1

Chapter 6: Classification

Induction on Decision Trees

Machine Learning and Data Mining. Decision Trees. Prof. Alexander Ihler

The Solution to Assignment 6

Tutorial 6. By:Aashmeet Kalra

Data Mining in Bioinformatics

Apprentissage automatique et fouille de données (part 2)

Decision Trees. CSC411/2515: Machine Learning and Data Mining, Winter 2018 Luke Zettlemoyer, Carlos Guestrin, and Andrew Moore

Decision Trees. Nicholas Ruozzi University of Texas at Dallas. Based on the slides of Vibhav Gogate and David Sontag

Bias Correction in Classification Tree Construction ICML 2001

Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation

ML techniques. symbolic techniques different types of representation value attribute representation representation of the first order

Information Gain, Decision Trees and Boosting ML recitation 9 Feb 2006 by Jure

Building Bayesian Networks. Lecture3: Building BN p.1

Jialiang Bao, Joseph Boyd, James Forkey, Shengwen Han, Trevor Hodde, Yumou Wang 10/01/2013

Chapter 4.5 Association Rules. CSCI 347, Data Mining

Introduction to Machine Learning CMU-10701

Bayesian Learning Features of Bayesian learning methods:

Classification: Decision Trees

Information Theory & Decision Trees

Algorithms for Classification: The Basic Methods

Classification: Rule Induction Information Retrieval and Data Mining. Prof. Matteo Matteucci

Chapter 3: Decision Tree Learning (part 2)

COMP 328: Machine Learning

Data Mining vs Statistics

Supervised Learning: Regression & Classifiers. Fall 2010

Classification II: Decision Trees and SVMs

Transcription:

Artificial Intelligence Topic

What is decision tree? A tree where each branching node represents a choice between two or more alternatives, with every branching node being part of a path to a leaf node (bottom of the tree). The leaf node represents a decision, derived from the tree for the given input.

ID3 Algorithm Top-down, greedy search No backtracking: So, algorithm proceeds from top to bottom

What is a greedy search? At each step, make decision which makes greatest improvement in whatever you are trying optimize. Do not backtrack (unless you hit a dead end) This type of search is likely not to be a globally optimum solution, but generally works well. What are we really doing here? At each node of tree, make decision on which attribute best classifies training data at that point. Never backtrack (in ID3) Do this for each branch of tree. End result will be tree structure representing a hypothesis which works best for the training data.

Constructing a decision tree using information gain A decision tree can be constructed top-down using the information gain in the following way: begin at the root node determine the attribute with the highest information gain which is not already used as an ancestor node add a child node for each possible value of that attribute attach all examples to the child node where the attribute values of the examples are identical to the attribute value attached to the node if all examples attached to the child node can be classified uniquely add that classification to that node and mark it as leaf node go back to step two if there are unused attributes left, otherwise add the classification of most of the examples attached to the child node.

Object, sample, example Training Examples Attribute, variable, property Shall we play tennis today? (Tennis 1) decision

Shall we play tennis today?

The tree itself forms hypothesis Disjunction (OR s) of conjunctions (AND s) Each path from root to leaf forms conjunction of constraints on attributes Separate branches are disjunctions Example from Play Tennis decision tree: (Outlook=Sunny Humidity=Normal) (Outlook=Overcast) (Outlook=Rain Wind=Weak)

Hypothesis space can include disjunctive expressions. Set of possible decision trees. In fact, hypothesis space is complete space of finite discrete-valued functions

Question? How do you determine which attribute best classifies data?

What is entropy? Entropy is a measure of the uncertainty associated with a random variable

e.g. A series of coin tosses with a fair coin has maximum entropy, since there is no way to predict what will come next. A string of coin tosses with a coin with two heads and no tails has zero entropy, since the coin will always come up heads. A single toss of a fair coin has an entropy of one bit, but a particular result (e.g. "heads") has zero entropy, since it is entirely "predictable

Entropy Entropy (disorder, impurity) of a set of examples, S, relative to a binary classification is: Entropy H ( S) p1 log2( p1) p0 log2( p0) where p 1 is the fraction of positive examples in S and p 0 is the fraction of negatives. If all examples are in one category, entropy is zero (we define 0 log(0)=0) If examples are equally mixed (p 1 =p 0 =0.5), entropy is a maximum of 1. 19

Information gain Information gain is how well one attribute classifies the training data.in other words it is the reduction in entropy. Mathematical expression for information gain: A i Gain( S, A ) H( S) P( A v) H( S ) i i v v Values ( A i )

ID3 algorithm (for boolean-valued function) Calculate the entropy for all training examples positive and negative cases p + = #pos/tot p - = #neg/tot H(S) = -p + log 2 (p + ) - p - log 2 (p - ) Use attribute with greatest information gain as a root

Example: PlayTennis Four attributes used for classification: Outlook = {Sunny,Overcast,Rain} Temperature = {Hot, Mild, Cool} Humidity = {High, Normal} Wind = {Weak, Strong} One predicted (target) attribute (binary) PlayTennis = {Yes,No} Given 14 Training examples 9 positive 5 negative

Examples, minterms, cases, objects, test cases, Training Examples

14 cases 9 positive cases Step 1: Calculate entropy for all cases: entropy N Pos = 9 N Neg = 5 N Tot = 14 H(S) = -(9/14)*log 2 (9/14) - (5/14)*log 2 (5/14) = 0.940

Step 2: Loop over all attributes, calculate gain: Attribute = Outlook Loop over values of Outlook Outlook = Sunny N Pos = 2 N Neg = 3 N Tot = 5 H(Sunny) = -(2/5)*log 2 (2/5) - (3/5)*log 2 (3/5) = 0.971 Outlook = Overcast N Pos = 4 N Neg = 0 N Tot = 4 H(Overcast) = -(4/4)*log 2 4/4) - (0/4)*log 2 (0/4) = 0.00

Outlook = Rain N Pos = 3 N Neg = 2 N Tot = 5 H(Rain) = -(3/5)*log 2 (3/5) - (2/5)*log 2 (2/5) = 0.971 Calculate Information Gain for attribute Outlook Gain(S,Outlook) = H(S) - N Sunny /N Tot *H(Sunny) - N Over /N Tot *H(Overcast) - N Rain /N Tot *H(Rainy) Gain(S,Outlook) = 0.940 - (5/14)*0.971 - (4/14)*0 - (5/14)*0.971 Gain(S,Outlook) = 0.246 Attribute = Temperature (Repeat process looping over {Hot, Mild, Cool}) Gain(S,Temperature) = 0.029

Attribute = Humidity (Repeat process looping over {High, Normal}) Gain(S,Humidity) = 0.029 Attribute = Wind (Repeat process looping over {Weak, Strong}) Gain(S,Wind) = 0.048 Find attribute with greatest information gain: Gain(S,Outlook) = 0.246, Gain(S,Temperature) = 0.029 Gain(S,Humidity) = 0.029, Gain(S,Wind) = 0.048 Outlook is root node of tree

Iterate algorithm to find attributes which best classify training examples under the values of the root node Example continued Take three subsets: Outlook = Sunny (N Tot = 5) Outlook = Overcast (N Tot = 4) Outlook = Rainy (N Tot = 5) For each subset, repeat the above calculation looping over all attributes other than Outlook

For example: Outlook = Sunny (N Pos = 2, N Neg =3, N Tot = 5) H=0.971 Temp = Hot (N Pos = 0, N Neg =2, N Tot = 2) H = 0.0 Temp = Mild (N Pos = 1, N Neg =1, N Tot = 2) H = 1.0 Temp = Cool (N Pos = 1, N Neg =0, N Tot = 1) H = 0.0 Gain(S Sunny,Temperature) = 0.971 - (2/5)*0 - (2/5)*1 - (1/5)*0 Gain(S Sunny,Temperature) = 0.571 Similarly: Gain(S Sunny,Humidity) = 0.971 Gain(S Sunny,Wind) = 0.020 Humidity classifies Outlook=Sunny instances best and is placed as the node under Sunny outcome. Repeat this process for Outlook = Overcast &Rainy

End up with tree:

Drawback A drawback of using decision trees is that the outcomes of decisions, subsequent decisions and payoffs may be based primarily on expectations. For example, if you expect that your parents will pay for half of your college when deciding to go to school, but later discover that you will have to pay for all of your tuition, your expected payoffs will be dramatically different than reality.

Controlling Over fitting Introduce a restriction on the hypotheses space to prevent overly-complex hypotheses from being learned.

Pros and Cons of Decision Tree Advantages: Works well with discrete data Produces human-comprehensible concepts Easy to understand Able to process both numerical and categorical data Disadvantages: Trees created from numeric datasets can be complex Limited to one output attribute

THANKS