Administrative notes February 27, 2018

Similar documents
Administrative notes. Computational Thinking ct.cs.ubc.ca

Decision Trees. Gavin Brown

Algorithms for Classification: The Basic Methods

Learning Decision Trees

Learning Decision Trees

The Solution to Assignment 6

Classification: Decision Trees

Data Mining. Chapter 1. What s it all about?

Decision Trees. Tirgul 5

Decision Tree Analysis for Classification Problems. Entscheidungsunterstützungssysteme SS 18

Leveraging Randomness in Structure to Enable Efficient Distributed Data Analytics

Classification and Regression Trees

Machine Learning. Yuh-Jye Lee. March 1, Lab of Data Science and Machine Intelligence Dept. of Applied Math. at NCTU

Decision Trees. Each internal node : an attribute Branch: Outcome of the test Leaf node or terminal node: class label.

Decision trees. Special Course in Computer and Information Science II. Adam Gyenge Helsinki University of Technology

( D) I(2,3) I(4,0) I(3,2) weighted avg. of entropies

Supervised Learning! Algorithm Implementations! Inferring Rudimentary Rules and Decision Trees!

Decision Tree Learning

Decision T ree Tree Algorithm Week 4 1

Decision Trees Part 1. Rao Vemuri University of California, Davis

Imagine we ve got a set of data containing several types, or classes. E.g. information about customers, and class=whether or not they buy anything.

Decision Tree Learning and Inductive Inference

CS6375: Machine Learning Gautam Kunapuli. Decision Trees

Decision Trees. Data Science: Jordan Boyd-Graber University of Maryland MARCH 11, Data Science: Jordan Boyd-Graber UMD Decision Trees 1 / 1

Artificial Intelligence Decision Trees

Classification Using Decision Trees

Decision Trees. Danushka Bollegala

Introduction. Decision Tree Learning. Outline. Decision Tree 9/7/2017. Decision Tree Definition

Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation

Decision-Tree Learning. Chapter 3: Decision Tree Learning. Classification Learning. Decision Tree for PlayTennis

Decision Support. Dr. Johan Hagelbäck.

Tutorial 6. By:Aashmeet Kalra

Chapter 3: Decision Tree Learning

Decision Tree Learning - ID3

CSE-4412(M) Midterm. There are five major questions, each worth 10 points, for a total of 50 points. Points for each sub-question are as indicated.

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Intelligent Data Analysis. Decision Trees

Machine Learning Recitation 8 Oct 21, Oznur Tastan

Learning Classification Trees. Sargur Srihari

Lecture 3: Decision Trees

Rule Generation using Decision Trees

Tools of AI. Marcin Sydow. Summary. Machine Learning


Decision Trees / NLP Introduction

Inteligência Artificial (SI 214) Aula 15 Algoritmo 1R e Classificador Bayesiano

Data Mining Prof. Pabitra Mitra Department of Computer Science & Engineering Indian Institute of Technology, Kharagpur

Decision Trees.

Reminders. HW1 out, due 10/19/2017 (Thursday) Group formations for course project due today (1 pt) Join Piazza (

EECS 349:Machine Learning Bryan Pardo

Artificial Intelligence. Topic

CS 6375 Machine Learning

Decision Tree Learning Mitchell, Chapter 3. CptS 570 Machine Learning School of EECS Washington State University

} It is non-zero, and maximized given a uniform distribution } Thus, for any distribution possible, we have:

Machine Learning 2nd Edi7on

Decision Trees. Nicholas Ruozzi University of Texas at Dallas. Based on the slides of Vibhav Gogate and David Sontag

Lecture 3: Decision Trees

Classification and Prediction

Decision Trees.

Induction on Decision Trees

Dan Roth 461C, 3401 Walnut

the tree till a class assignment is reached

Data classification (II)

Decision Tree Learning

Administration. Chapter 3: Decision Tree Learning (part 2) Measuring Entropy. Entropy Function

Decision Trees. CSC411/2515: Machine Learning and Data Mining, Winter 2018 Luke Zettlemoyer, Carlos Guestrin, and Andrew Moore

COMP61011! Probabilistic Classifiers! Part 1, Bayes Theorem!

Question of the Day. Machine Learning 2D1431. Decision Tree for PlayTennis. Outline. Lecture 4: Decision Tree Learning

Classification: Rule Induction Information Retrieval and Data Mining. Prof. Matteo Matteucci

Bayesian Learning. Artificial Intelligence Programming. 15-0: Learning vs. Deduction

Decision Tree Learning

DECISION TREE LEARNING. [read Chapter 3] [recommended exercises 3.1, 3.4]

Symbolic methods in TC: Decision Trees

Bayesian Classification. Bayesian Classification: Why?

Outline. Training Examples for EnjoySport. 2 lecture slides for textbook Machine Learning, c Tom M. Mitchell, McGraw Hill, 1997

Introduction to ML. Two examples of Learners: Naïve Bayesian Classifiers Decision Trees

Decision Trees. CSC411/2515: Machine Learning and Data Mining, Winter 2018 Luke Zettlemoyer, Carlos Guestrin, and Andrew Moore

Modern Information Retrieval

Thank you for choosing AIMS!

UVA CS 4501: Machine Learning

Machine Learning Alternatives to Manual Knowledge Acquisition

Solving Equations by Adding and Subtracting

Decision Tree Learning

From statistics to data science. BAE 815 (Fall 2017) Dr. Zifei Liu

The Quadratic Entropy Approach to Implement the Id3 Decision Tree Algorithm

CC283 Intelligent Problem Solving 28/10/2013

Machine Learning & Data Mining

Machine Learning, Fall 2009: Midterm

Probability Based Learning

Classification II: Decision Trees and SVMs

Decision Tree And Random Forest

Classification. Classification. What is classification. Simple methods for classification. Classification by decision tree induction

Machine Learning 3. week

CS246 Final Exam, Winter 2011

Hierarchical Bayesian Nonparametrics

Information and Entropy. Professor Kevin Gold

Introduction to Machine Learning CMU-10701

Chapter 4.5 Association Rules. CSCI 347, Data Mining

Watching the Weather

Weather and climate. reflect. what do you think? look out!

Name (NetID): (1 Point)

Nondeterministic finite automata

Transcription:

Administrative notes February 27, 2018 Welcome back! Reminder: In the News Call #2 due tomorrow Reminder: Midterm #2 is on March 13 Project proposals are all marked. You can resubmit your proposal after you have addressed the concerns raised by your TA. We will take the highest mark out of the project proposal and the project proposal resubmission. Each group has a TA assigned to their project they will be your project manager for the rest of the term. They are also the ones marking most of your project work. Make sure everyone is on the same page! Communicate often!

Frozen! w that you have had some real life experience with snow, I thought I d show you the video I wanted to show last class.

In the News Paul Manafort left an electronic paper trail incriminating himself because he couldn t convert a PDF document to a Word document https://slate.com/technology/2018/02/paulmanafort-couldnt-convert-pdfs-to-worddocuments.html

Data Mining

Learning Goals [CT Building Block] Students will be able to create English language descriptions of algorithms to analyze data and show how their algorithms would work on an input data set. [CT Application] Students will be able to use computing to examine datasets and facilitate exploration in order to gain insight and knowledge (data and information). [CT Impact] Students will be able to give examples of privacy and security issues that arise as a result of data mining [CT Building Block] Students will be able to describe what a greedy algorithm is [CT Building Block] Students will be able to describe what the general decisions are in building a decision tree [CT Building Block] Students will be able to build a simple decision tree [CT Building Block] Students will be able to describe what considerations are important in building a decision tree [CT Building Block] Students will be able to give examples showing why clustering is useful, describe how clustering tasks can be formulated in terms of high-dimensional numerical data, and infer what is the output of the k-means clustering algorithm on a small input.

Why data mining? The world is awash with digital data; trillions of gigabytes and growing How many bytes in a gigabyte? Clicker question A. 1 000 000 B. 1 000 000 000 C. 1 000 000 000 000

Why data mining? The world is awash with digital data; trillions of gigabytes and growing A trillion gigabytes is a zettabyte, or 1 000 000 000 000 000 000 000 bytes

Why data mining? More and more, businesses and institutions are using data mining to make decisions, classifications, diagnoses, and recommendations that affect our lives

An example data mining quote from the NY Times article We have the capacity to send every customer an ad booklet, specifically designed for them, that says, Here s everything you bought last week and a coupon for it, one Target executive told me. We do that for grocery products all the time. But for pregnant women, Target s goal was selling them baby items they didn t even know they needed yet.

Target can identify pregnant women and send them individual mailings In a group of 3-4 discuss whether you think this is cool, creepy, or both A. Cool B. Creepy C. Both D. Neither

Target: Cool, creepy or both Cool You might be a victim of food borne disease so you might benefit Pay off student loans by suing them for invasion of privacy Saves time Creepy Ads that aren t applicable to me are more irritating. It doesn t feel creepy if you can figure out why they are giving you those coupons but if you pay for something in cash but you get a coupon, that s creepy

Okay. What if I told you People pay different insurance (life or medical) rates based on being pregnant. Insurance companies may now want to pay Target for this information. Does this change your opinion? A. It was already creepy B. It wasn t creepy before, but this is C., still not creepy

Clicker question: Loyalty cards and credit cards After reading these articles, are you more or less likely to use a credit card/loyalty card for purchases: A. More likely B. Less likely C. The same

As we discussed, cookies tell information about you. But how do pages that you ve visited predict the future?

Data Mining Data mining is the process of looking for patterns in large data sets There are many different kinds for many different purposes We ll do an in depth exploration of two of them

Data mining for classification Recall our loan application example Credit Rating Goal: given colours, credit ratings, and past rates of successfully paying back loans, decide to grant a loan or not.

Data mining for classification In the loan strategy example, we focused on fairness of different classifiers, but we didn t focus much on how to build a classifier Today you ll learn how to build decision tree classifiers for simple data mining scenarios

A rooted tree in computer science Before we get to decision trees, we need to define a tree

A rooted tree in computer science A tree is a collection of nodes such that one node is the designated root a node can have zero or more children; a node with zero children is a leaf all non-root nodes have a single parent edges denote parentchild relationships nodes and/or edges may be labeled by data E B F A C G D H I J K L M O

A rooted tree in computer science Often but not always drawn with root on top Casaurius Dromaius Apteryx owenii Apteryx haastii Asteryx mantelli Apteryx australis Aepyornithidae Struthio Rhea Pterocnernia Megalapteryx Dinornis Pachyornis Emeus Anomalopteryx

Is this a rooted tree? Clicker question 1 2 3 4 5 6 A. de 5 has two parents B. C. I m not sure http://jerome.boulinguez.free.fr/english/file/hotpotatoes/familytree.htm

Decision trees: trees whose node labels are attributes, edge labels are conditions

Decision trees: trees whose node labels are attributes, edge labels are conditions Enzyme Immunoassay Consider Alternative Diagnosis Test Symptom Length 30 days > 30 days IgM and IgG Western Blot IgG Western Blot ONLY Decision tree for Lyme Disease diagnosis

Decision trees: trees whose node labels are attributes, edge labels are conditions https://gbr.pepperdine.edu/2010/08/how-gerber-used-adecision-tree-in-strategic-decision-making/

Back to our example. We may want to make a tree saying when to approve or deny a loan Credit Rating Goal: given colours, credit ratings, and past rates of successfully paying back loans, decide to grant a loan or not.

Decision trees: trees whose node labels are attributes, edge labels are conditions A decision tree for max profit loan strategy blue credit rating colour orange credit rating 61 < 61 50 < 50 approve deny approve deny (te that some worthy applicants are denied loans, while other unworthy ones get loans)

Exercise: Construct the decision tree for the Group Unaware loan strategy Credit Rating Goal: given colours, credit ratings, and past rates of successfully paying back loans, decide to grant a loan or not.

Sample Decision Tree for Group Unaware strategy A decision tree for max profit loan strategy credit rating 55 < 55 approve deny (te that some worthy applicants are denied loans, while other unworthy ones get loans)

Building decision trees from training data Should you get an ice cream? You might start out with the following data You might build a decision tree that looks like this: attributes Weather Wallet Ice Cream? Great Empty Empty Wallet Full Nasty Empty Great Full Okay Full Nasty Full conditions Nasty Weather Okay Great

Shall we play a game? Help a soccer league decide whether to cancel games Build a decision tree to help officials decide Assume that decisions are the same given the same information The leaf nodes should be whether or not to play The non-leaf nodes should be attributes (e.g., Outlook, Windy) The edges should be conditions (e.g., sunny, hot, normal) Outlook Temperature Humidity Windy Play? sunny hot high false sunny hot high true overcast hot high false rain mild high false rain cool normal false rain cool normal true overcast cool normal true sunny mild high false sunny cool normal false rain mild normal false sunny mild normal true overcast mild high true overcast hot normal false rain mild high true Example adapted from http://www.kdnuggets.com/ data_mining_course/index.html#materials

Some example potential starts to the decision tree Outlook? Windy? Overcast Rainy Sunny Temperature? Humidity? Windy? Humidity? true false Humidity?

How did you split up your tree and why? Minimize the number of branches required in the decision tree

Deciding which nodes go where: A decision tree construction algorithm Top-down tree construction At start, all examples are at the root. Partition the examples recursively by choosing one attribute each time. In deciding which attribute to split on, one common method is to try to reduce entropy i.e., each time you split, you should make the resulting groups more homogenous. The more you reduce entropy, the higher the information gain.

Let s go back to our example Intuitively, our goal is to get to having as few mixed and answers together in groups as possible. So in the initial case, we have 14 mixed s and s Outlook Temperature Humidity Windy Play? sunny hot high false sunny hot high true overcast hot high false rain mild high false rain cool normal false rain cool normal true overcast cool normal true sunny mild high false sunny cool normal false rain mild normal false sunny mild normal true overcast mild high true overcast hot normal false rain mild high true

What happens if we split on Temperature? temperature hot mild cool 4 mixed 4 mixed 6 mixed Overall entropy = 4 + 4 + 6 = 14

What s the entropy if you split on Outlook? Group exercise A. 0 B. 5 C. 10 D. 14 E. ne of the above Outlook Temperature Humidity Windy Play? sunny hot high false sunny hot high true overcast hot high false rain mild high false rain cool normal false rain cool normal true overcast cool normal true sunny mild high false sunny cool normal false rain mild normal false sunny mild normal true overcast mild high true overcast hot normal false rain mild high true

What s the entropy if you split on Outlook? Group exercise results Outlook sunny overcast rainy 0 mixed 5 mixed 5 mixed Overall entropy = 5 + 0 + 5 = 10

What if you split on Windy? Windy false true A. 0 B. 6 C. 8 D. 14 E. ne of the above 6 mixed 8 mixed Overall entropy = 8+6=14

What if you split on Humidity? Humidity high normal 7 mixed 7 mixed Overall entropy = 7+7=14

To summarize Entropy if we split on Outlook = 10 Entropy if we split on Windy = 14 Entropy if we split on Humidity = 14 à The best option to split on is Outlook It does the best job of reducing entropy

This example suggests why a more complex entropy definition might be better Windy Humidity false true high normal Humidity is better, even though both have entropy 14

Great! w we do the same thing again Here s what we have so far: Outlook sunny overcast rainy For each option, we have to decide which attribute to split on next: Temperature, Windy, or Humidity.

Great! w we do the same thing again Clicker question What s the best attribute to split on for Outlook = sunny? A. B. C. Temperature Windy Humidity hot mild cool false true high normal

We don t need to split for Outlook = overcast The answer was yes each time. So we re done there.

What s the best attribute to split on for Outlook = rain? Clicker question A. B. C. Temperature Windy Humidity hot N/A mild cool false true high normal

This was, of course, a simple example In this example, the algorithm found the tree with the smallest number of nodes We were given the attributes and conditions A simplistic notion of entropy worked (a more sophisticated notion of entropy is typically used to determine which attribute to split on)

This was, of course, a simple example In more complex examples, like the loan application example We may not know which conditions or attributes are best to use The final decision may not be correct in every case (e.g., given two loan applicants with the same colour and credit rating, one may be credit worthy while the other is not) Even if the final decision is always correct, the tree may not be of minimum size

Coding up a decision tree classifier Outlook sunny overcast rainy Humidity Windy high normal false true

Coding up a decision tree classifier Outlook sunny overcast Humidity high normal Can you see the relationship between the hierarchical tree structure and the hierarchical nesting of if statements?

Coding up a decision tree classifier Outlook rainy Windy false true Can you extend the code to handle the rainy case?