Administrative notes. Computational Thinking ct.cs.ubc.ca

Similar documents
Administrative notes February 27, 2018

Decision Trees. Gavin Brown

The Solution to Assignment 6

Learning Decision Trees

Learning Decision Trees

Decision Trees. Each internal node : an attribute Branch: Outcome of the test Leaf node or terminal node: class label.

Algorithms for Classification: The Basic Methods

Decision trees. Special Course in Computer and Information Science II. Adam Gyenge Helsinki University of Technology

Machine Learning. Yuh-Jye Lee. March 1, Lab of Data Science and Machine Intelligence Dept. of Applied Math. at NCTU

Classification: Decision Trees

( D) I(2,3) I(4,0) I(3,2) weighted avg. of entropies

Artificial Intelligence Decision Trees

Decision Trees. Tirgul 5

Decision Trees. Danushka Bollegala

CSE-4412(M) Midterm. There are five major questions, each worth 10 points, for a total of 50 points. Points for each sub-question are as indicated.

Decision Tree Learning

Classification Using Decision Trees

Rule Generation using Decision Trees

Decision Support. Dr. Johan Hagelbäck.

Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation

Classification and Regression Trees

Decision Tree Analysis for Classification Problems. Entscheidungsunterstützungssysteme SS 18

Imagine we ve got a set of data containing several types, or classes. E.g. information about customers, and class=whether or not they buy anything.

Machine Learning Recitation 8 Oct 21, Oznur Tastan


Decision Tree Learning and Inductive Inference

Induction on Decision Trees

Bayesian Classification. Bayesian Classification: Why?

Decision Trees Part 1. Rao Vemuri University of California, Davis

Decision T ree Tree Algorithm Week 4 1

Decision Trees. Data Science: Jordan Boyd-Graber University of Maryland MARCH 11, Data Science: Jordan Boyd-Graber UMD Decision Trees 1 / 1

Data Mining. Chapter 1. What s it all about?

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Intelligent Data Analysis. Decision Trees

Machine Learning 2nd Edi7on

Learning Classification Trees. Sargur Srihari

Inteligência Artificial (SI 214) Aula 15 Algoritmo 1R e Classificador Bayesiano

Symbolic methods in TC: Decision Trees

Data Mining Prof. Pabitra Mitra Department of Computer Science & Engineering Indian Institute of Technology, Kharagpur

Introduction. Decision Tree Learning. Outline. Decision Tree 9/7/2017. Decision Tree Definition

Lecture 3: Decision Trees

Reminders. HW1 out, due 10/19/2017 (Thursday) Group formations for course project due today (1 pt) Join Piazza (

CS 6375 Machine Learning

Artificial Intelligence. Topic

Tutorial 6. By:Aashmeet Kalra

Leveraging Randomness in Structure to Enable Efficient Distributed Data Analytics

Decision Tree Learning Mitchell, Chapter 3. CptS 570 Machine Learning School of EECS Washington State University

CS6375: Machine Learning Gautam Kunapuli. Decision Trees

Decision-Tree Learning. Chapter 3: Decision Tree Learning. Classification Learning. Decision Tree for PlayTennis

Lecture 3: Decision Trees

Chapter 3: Decision Tree Learning

Decision Trees.

Classification and Prediction

EECS 349:Machine Learning Bryan Pardo

Classification: Rule Induction Information Retrieval and Data Mining. Prof. Matteo Matteucci

Supervised Learning! Algorithm Implementations! Inferring Rudimentary Rules and Decision Trees!

Bayesian Learning. Artificial Intelligence Programming. 15-0: Learning vs. Deduction

Decision Trees.

DECISION TREE LEARNING. [read Chapter 3] [recommended exercises 3.1, 3.4]

Modern Information Retrieval

The Naïve Bayes Classifier. Machine Learning Fall 2017

Chapter 4.5 Association Rules. CSCI 347, Data Mining

Tools of AI. Marcin Sydow. Summary. Machine Learning

Symbolic methods in TC: Decision Trees

Data classification (II)

Machine Learning Alternatives to Manual Knowledge Acquisition

Dan Roth 461C, 3401 Walnut

UVA CS 4501: Machine Learning

Administration. Chapter 3: Decision Tree Learning (part 2) Measuring Entropy. Entropy Function

Answers Machine Learning Exercises 2

Decision Tree Learning

Decision Tree Learning - ID3

Decision Trees. CSC411/2515: Machine Learning and Data Mining, Winter 2018 Luke Zettlemoyer, Carlos Guestrin, and Andrew Moore

Einführung in Web- und Data-Science

Decision Tree Learning

Data Mining and Machine Learning

the tree till a class assignment is reached

Introduction to ML. Two examples of Learners: Naïve Bayesian Classifiers Decision Trees

Introduction to Machine Learning CMU-10701

Question of the Day. Machine Learning 2D1431. Decision Tree for PlayTennis. Outline. Lecture 4: Decision Tree Learning

Decision Tree. Decision Tree Learning. c4.5. Example

CLASSIFICATION NAIVE BAYES. NIKOLA MILIKIĆ UROŠ KRČADINAC

Midterm. You may use a calculator, but not any device that can access the Internet or store large amounts of data.

} It is non-zero, and maximized given a uniform distribution } Thus, for any distribution possible, we have:

Naïve Bayes Lecture 6: Self-Study -----

Jialiang Bao, Joseph Boyd, James Forkey, Shengwen Han, Trevor Hodde, Yumou Wang 10/01/2013

Machine Learning & Data Mining

Decision Tree Learning

Decision Trees / NLP Introduction

Data Mining vs Statistics

Outline. Training Examples for EnjoySport. 2 lecture slides for textbook Machine Learning, c Tom M. Mitchell, McGraw Hill, 1997

Machine Learning 2010

Inductive Learning. Chapter 18. Material adopted from Yun Peng, Chuck Dyer, Gregory Piatetsky-Shapiro & Gary Parker

Name (NetID): (1 Point)

Decision Trees. None Some Full > No Yes. No Yes. No Yes. No Yes. No Yes. No Yes. No Yes. Patrons? WaitEstimate? Hungry? Alternate?

Weather Prediction Using Historical Data

ML techniques. symbolic techniques different types of representation value attribute representation representation of the first order

CS 380: ARTIFICIAL INTELLIGENCE MACHINE LEARNING. Santiago Ontañón

Classification II: Decision Trees and SVMs

Bayesian Learning. Bayesian Learning Criteria

Ensemble Methods. Charles Sutton Data Mining and Exploration Spring Friday, 27 January 12

Transcription:

Administrative notes Labs this week: project time. Remember, you need to pass the project in order to pass the course! (See course syllabus.) Clicker grades should be on-line now

Administrative notes March 3: Data mining reading quiz March 14: Midterm 2 March 17: In the News call #3 March 30: Project deliverables and individual report due

Data mining: finding patterns in data Part 1: Building decision tree classifiers from data

Learning goals [CT Building Block] Students will be able to build a simple decision tree [CT Building Block] Students will be able to describe what considerations are important in building a decision tree

Why data mining? The world is awash with digital data; trillions of gigabytes and growing How many bytes in a gigabyte? Clicker question A. 1 000 000 B. 1 000 000 000 C. 1 000 000 000 000

Why data mining? The world is awash with digital data; trillions of gigabytes and growing A trillion gigabytes is a zettabyte, or 1 000 000 000 000 000 000 000 bytes

Why data mining? More and more, businesses and institutions are using data mining to make decisions, classifications, diagnoses, and recommendations that affect our lives

Data mining for classification Recall our loan application example

Data mining for classification In the loan strategy example, we focused on fairness of different classifiers, but we didn t focus much on how to build a classifier Today you ll learn how to build decision tree classifiers for simple data mining scenarios

A rooted tree in computer science Before we get to decision trees, we ll define what is a tree

A rooted tree in computer science A collection of nodes such that one node is the designated root a node can have zero or more children; a node with zero children is a leaf all non-root nodes have a single parent

A rooted tree in computer science A collection of nodes such that one node is the designated root a node can have zero or more children; a node with zero children is a leaf all non-root nodes have a single parent edges denote parent-child relationships nodes and/or edges may be labeled by data

A rooted tree in computer science Often but not always drawn with root on top

Are these rooted trees? Clicker question 1 2 A. 1 but not 2 B. 2 both not 1 C. Both 1 and 2 D. Neither 1 nor 2 http://jerome.boulinguez.free.fr/english/file/hotpotatoes/familytree.htm

Is this a rooted tree? Clicker question A. B. C. I m not sure de 2 has two parents, and there s no unique root http://jerome.boulinguez.free.fr/english/file/hotpotatoes/familytree.htm

Decision trees: trees whose node labels are attributes, edge labels are conditions

Decision trees: trees whose node labels are attributes, edge labels are conditions

Decision trees: trees whose node labels are attributes, edge labels are conditions

Decision trees: trees whose node labels are attributes, edge labels are conditions

Decision trees: trees whose node labels are attributes, edge labels are conditions https://gbr.pepperdine.edu/2010/08/how-gerber-used-adecision-tree-in-strategic-decision-making/

Decision trees: trees whose node labels are attributes, edge labels are conditions A decision tree for max profit loan strategy colour credit rating blue orange credit rating > 61 < 61 > 50 < 50 approve deny approve deny (te that some worthy applicants are denied loans, while other unworthy ones get loans)

Exercise: Construct the decision tree for the Group Unaware loan strategy

Building decision trees from training data Should you get an ice cream? You might start out with the following data Weather Wallet Ice Cream? Great Empty Nasty Empty Great Full Okay Full Nasty Full

Building decision trees from training data Should you get an ice cream? You might start out with the following data attributes Weather Wallet Ice Cream? Great Empty Nasty Empty Great Full Okay Full Nasty Full conditions

Building decision trees from training data Should you get an ice cream? You might start out with the following data You might build a decision tree that looks like this: attributes Weather Wallet Ice Cream? Great Empty Empty Wallet Full Nasty Empty Great Full Okay Full Nasty Full conditions Nasty Weather Okay Great

Shall we play a game? Suppose we want to help a soccer league decide whether on not to cancel games. We have some data. Our goal is a decision tree to help officials make decisions Assume that decisions are the same given the same information Outlook Temperature Humidity Windy Play? sunny hot high false sunny hot high true overcast hot high false rain mild high false rain cool normal false rain cool normal true overcast cool normal true sunny mild high false sunny cool normal false rain mild normal false sunny mild normal true overcast mild high true overcast hot normal false rain mild high true Example adapted from http://www.kdnuggets.com/ data_mining_course/index.html#materials

Create a decision tree Group exercise Create a decision tree that decides whether the game should be played or not The leaf nodes should be whether or not to play The non-leaf nodes should be attributes (e.g., Outlook, Windy) The edges should be conditions (e.g., sunny, hot, normal) Outlook Temperature Humidity Windy Play? sunny hot high false sunny hot high true overcast hot high false rain mild high false rain cool normal false rain cool normal true overcast cool normal true sunny mild high false sunny cool normal false rain mild normal false sunny mild normal true overcast mild high true overcast hot normal false rain mild high true

Some example potential starts to the decision tree Outlook? Windy? Overcast Rainy Sunny Temperature? Humidity? Windy? Humidity? true false Humidity?

How did you split up your tree and why? Student responses Looked at each condition of every attribute, case by case, until got to the end, working left to right through the columns There are ways that the tree can be made smaller, by finding patterns. E.g., if it s overcast, the answer is always yes.

Here s that example again Create a decision tree that decides whether the game should be played or not The leaf nodes should be whether or not to play The non-leaf nodes should be attributes (e.g., Outlook, Windy) The edges should be conditions (e.g., sunny, hot, normal) Outlook Temperature Humidity Windy Play? sunny hot high false sunny hot high true overcast hot high false rain mild high false rain cool normal false rain cool normal true overcast cool normal true sunny mild high false sunny cool normal false rain mild normal false sunny mild normal true overcast mild high true overcast hot normal false rain mild high true

Deciding which nodes go where: A decision tree construction algorithm Top-down tree construction At start, all examples are at the root. Partition the examples recursively by choosing one attribute each time. In deciding which attribute to split on, one common method is to try to reduce entropy i.e., each time you split, you should make the resulting groups more homogenous. The more you reduce entropy, the higher the information gain.

Let s go back to our example Intuitively, our goal is to get to having as few mixed and answers together in groups as possible. So in the initial case, we have 14 mixed s and s Outlook Temperature Humidity Windy Play? sunny hot high false sunny hot high true overcast hot high false rain mild high false rain cool normal false rain cool normal true overcast cool normal true sunny mild high false sunny cool normal false rain mild normal false sunny mild normal true overcast mild high true overcast hot normal false rain mild high true

What happens if we split on Temperature? temperature hot mild cool 4 mixed 4 mixed 6 mixed Overall entropy = 4 + 4 + 6 = 14

What s the entropy if you split on Outlook? Group exercise A. 0 B. 5 C. 10 D. 14 E. ne of the above Outlook Temperature Humidity Windy Play? sunny hot high false sunny hot high true overcast hot high false rain mild high false rain cool normal false rain cool normal true overcast cool normal true sunny mild high false sunny cool normal false rain mild normal false sunny mild normal true overcast mild high true overcast hot normal false rain mild high true

What s the entropy if you split on Outlook? Group exercise results Outlook sunny overcast rainy 0 mixed 5 mixed 5 mixed Overall entropy = 5 + 0 + 5 = 10

What if you split on Windy? Windy false true 6 mixed 8 mixed Overall entropy = 8+6=14

What if you split on Humidity? Humidity high normal 7 mixed 7 mixed Overall entropy = 7+7=14

The best option to split on is Outlook It does the best job of reducing entropy

This example suggests why a more complex entropy definition might be better Windy Humidity false true high normal Humidity is better, even though both have entropy 14

Great! w we do the same thing again Here s what we have so far: Outlook sunny overcast rainy For each option, we have to decide which attribute to split on next: Temperature, Windy, or Humidity.

Great! w we do the same thing again Clicker question What s the best attribute to split on for Outlook = sunny? A. B. C. Temperature Windy Humidity hot mild cool false true high normal

We don t need to split for Outlook = overcast The answer was yes each time. So we re done there.

What s the best attribute to split on for Outlook = rain? Clicker question A. B. C. Temperature Windy Humidity hot N/A mild cool false true high normal

This was, of course, a simple example In this example, the algorithm found the tree with the smallest number of nodes We were given the attributes and conditions A simplistic notion of entropy worked (a more sophisticated notion of entropy is typically used to determine which attribute to split on)

This was, of course, a simple example In more complex examples, like the loan application example We may not know which conditions or attributes are best to use The final decision may not be correct in every case (e.g., given two loan applicants with the same colour and credit rating, one may be credit worthy while the other is not) Even if the final decision is always correct, the tree may not be of minimum size

Coding up a decision tree classifier Outlook sunny overcast rainy Humidity Windy high normal false true

Coding up a decision tree classifier Outlook sunny overcast Humidity high normal Can you see the relationship between the hierarchical tree structure and the hierarchical nesting of if statements?

Coding up a decision tree classifier Outlook sunny overcast Humidity high normal Can you extend the code to handle the rainy case?

Learning goals [CT Building Block] Students will be able to build a simple decision tree [CT Building Block] Students will be able to describe what considerations are important in building a decision tree