Machine Learning: Exercise Sheet 2

Similar documents
Version Spaces.

Concept Learning.

Decision Trees.

Decision Trees.

Boosting. Acknowledgment Slides are based on tutorials from Robert Schapire and Gunnar Raetsch

Lecture 3: Decision Trees

Linear Classifiers: Expressiveness

Notes on Machine Learning for and

Lecture 3: Decision Trees

Outline. Training Examples for EnjoySport. 2 lecture slides for textbook Machine Learning, c Tom M. Mitchell, McGraw Hill, 1997

Midterm: CS 6375 Spring 2015 Solutions

Tutorial 6. By:Aashmeet Kalra

Decision Trees. Nicholas Ruozzi University of Texas at Dallas. Based on the slides of Vibhav Gogate and David Sontag

Concept Learning through General-to-Specific Ordering

Lecture Notes in Machine Learning Chapter 4: Version space learning

Lecture 2: Foundations of Concept Learning

Concept Learning Mitchell, Chapter 2. CptS 570 Machine Learning School of EECS Washington State University

Introduction. Decision Tree Learning. Outline. Decision Tree 9/7/2017. Decision Tree Definition

Question of the Day? Machine Learning 2D1431. Training Examples for Concept Enjoy Sport. Outline. Lecture 3: Concept Learning

Concept Learning. Space of Versions of Concepts Learned

Introduction to Machine Learning

Decision Tree Learning

EECS 349:Machine Learning Bryan Pardo

Decision Tree Learning and Inductive Inference

Decision Tree Learning

Question of the Day. Machine Learning 2D1431. Decision Tree for PlayTennis. Outline. Lecture 4: Decision Tree Learning

Machine Learning 2nd Edi7on

Introduction to machine learning. Concept learning. Design of a learning system. Designing a learning system

Principles of AI Planning

Overview. Machine Learning, Chapter 2: Concept Learning

FINAL EXAM: FALL 2013 CS 6375 INSTRUCTOR: VIBHAV GOGATE

Learning Classification Trees. Sargur Srihari

Decision Tree Learning

Course 395: Machine Learning

Principles of AI Planning

Machine Learning 2D1431. Lecture 3: Concept Learning

Sections 8.1 & 8.2 Systems of Linear Equations in Two Variables

Outline. [read Chapter 2] Learning from examples. General-to-specic ordering over hypotheses. Version spaces and candidate elimination.

Computational Learning Theory

CSE 546 Final Exam, Autumn 2013

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

[read Chapter 2] [suggested exercises 2.2, 2.3, 2.4, 2.6] General-to-specific ordering over hypotheses

Decision Trees. Data Science: Jordan Boyd-Graber University of Maryland MARCH 11, Data Science: Jordan Boyd-Graber UMD Decision Trees 1 / 1

Introduction to Machine Learning CMU-10701

Answers Machine Learning Exercises 2

ECE521 Lecture7. Logistic Regression

Chapter 3: Decision Tree Learning

Decision Tree Learning Mitchell, Chapter 3. CptS 570 Machine Learning School of EECS Washington State University

CS 6375 Machine Learning

CS 380: ARTIFICIAL INTELLIGENCE MACHINE LEARNING. Santiago Ontañón

VC Dimension Review. The purpose of this document is to review VC dimension and PAC learning for infinite hypothesis spaces.

Decision-Tree Learning. Chapter 3: Decision Tree Learning. Classification Learning. Decision Tree for PlayTennis

Machine Learning

Complexity Theory VU , SS The Polynomial Hierarchy. Reinhard Pichler

Outline. Complexity Theory EXACT TSP. The Class DP. Definition. Problem EXACT TSP. Complexity of EXACT TSP. Proposition VU 181.

the tree till a class assignment is reached

Search and Lookahead. Bernhard Nebel, Julien Hué, and Stefan Wölfl. June 4/6, 2012

Relationship between Least Squares Approximation and Maximum Likelihood Hypotheses

Empirical Risk Minimization, Model Selection, and Model Assessment

Lecture 5: Efficient PAC Learning. 1 Consistent Learning: a Bound on Sample Complexity

Models, Data, Learning Problems

Induction of Decision Trees

Machine Learning Alternatives to Manual Knowledge Acquisition

Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation

Algorithm Theory - Exercise Class

26 Chapter 4 Classification

Computational Learning Theory

M chi h n i e n L e L arni n n i g Decision Trees Mac a h c i h n i e n e L e L a e r a ni n ng

Decision Trees. Danushka Bollegala

Dan Roth 461C, 3401 Walnut

Learning Decision Trees

Concept Learning. Berlin Chen Department of Computer Science & Information Engineering National Taiwan Normal University.

Computational Learning Theory

Machine Learning, Fall 2009: Midterm

Lecture 7 Decision Tree Classifier

Ensembles of Classifiers.

1. True/False (40 points) total

CSCE 478/878 Lecture 6: Bayesian Learning

Learning Decision Trees

Introduction to Signal Detection and Classification. Phani Chavali

Data classification (II)

Linear and Logistic Regression. Dr. Xiaowei Huang

Machine Learning 3. week

Concept Learning. Berlin Chen References: 1. Tom M. Mitchell, Machine Learning, Chapter 2 2. Tom M. Mitchell s teaching materials.

Machine Learning & Data Mining

Machine Learning (CS 567) Lecture 3

Decision Tree Learning

Machine Learning. Bayesian Learning. Acknowledgement Slides courtesy of Martin Riedmiller

Decision Trees. Tirgul 5

Bayesian Learning. Remark on Conditional Probabilities and Priors. Two Roles for Bayesian Methods. [Read Ch. 6] [Suggested exercises: 6.1, 6.2, 6.

Name (NetID): (1 Point)

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Decision Trees. Tobias Scheffer

Lecture 4: Data preprocessing: Data Reduction-Discretization. Dr. Edgar Acuna. University of Puerto Rico- Mayaguez math.uprm.

Outline. Complexity Theory. Introduction. What is abduction? Motivation. Reference VU , SS Logic-Based Abduction

Decision Tree Learning

Learning with multiple models. Boosting.

Complexity, P and NP

Decision Tree Learning Lecture 2

Final Exam, Fall 2002

Lecture 24: Other (Non-linear) Classifiers: Decision Tree Learning, Boosting, and Support Vector Classification Instructor: Prof. Ganesh Ramakrishnan

Transcription:

Machine Learning: Exercise Sheet 2 Manuel Blum AG Maschinelles Lernen und Natürlichsprachliche Systeme Albert-Ludwigs-Universität Freiburg mblum@informatik.uni-freiburg.de Manuel Blum Machine Learning Lab, University of Freiburg Machine Learning: Exercise Sheet 2 (1)

Task (a) What are the elements of the version space? hypotheses (descriptions of concepts) VS H,D H with respect to the hypothesis space H contains those hypotheses that are consistent with the training data D How are they ordered? arranged in a general-to-specific ordering partial order: g, < g Manuel Blum Machine Learning Lab, University of FreiburgMachine Learning: Exercise Sheet 2 Exercise 1: Version Spaces (2)

Task (a) What can be said about the meaning and sizes of G and S? They are sets containing the most general and most specific hypotheses consistent with the training data. Thus, they depict the general and specific boundary of the VS. For conjunctive hypotheses (which we consider here) it always holds S = 1, assuming consistent training data. G attains its maximal size, if negative patterns with maximal hamming distance have been presented. Thus, in the case of binary constraints, it holds G n(n 1) where n denotes the number of constraints per hypothesis. Manuel Blum Machine Learning Lab, University of FreiburgMachine Learning: Exercise Sheet 2 Exercise 1: Version Spaces (3)

Task (b) In the following, it is desired to describe whether a person is ill. We use a representation based on conjunctive constraints (three per subject) to describe individual person. These constraints are running nose, coughing, and reddened skin, each of which can take the value true ( + ) or false ( ). We say that somebody is ill, if he is coughing and has a reddened nose each single symptom individually does not mean that the person is ill. Specify the space of hypotheses that is being managed by the version space approach. To do so, arrange all hypotheses in a graph structure using the more-specific-than relation. hypotheses are vectors of constraints, denoted by N, C, R with N, C, R = {, +,, } Manuel Blum Machine Learning Lab, University of FreiburgMachine Learning: Exercise Sheet 2 Exercise 1: Version Spaces (4)

Task (c) Apply the candidate elimination (CE) algorithm to the sequence of training examples specified in the table and name the contents of the sets S and G after each step. Training N C R Classification Example (running nose) (coughing) (reddened skin) d 1 + + + positive (ill) d 2 + + positive (ill) d 3 + + negative (healthy) d 4 + + negative (healthy) d 5 + negative (healthy) d 6 negative (healthy) Manuel Blum Machine Learning Lab, University of FreiburgMachine Learning: Exercise Sheet 2 Exercise 1: Version Spaces (5)

Task (c) Start (init): G = { }, S = { } foreach d D do d1 = [ + + +, pos] G = { }, S = { + + + } d2 = [ + +, pos] G = { }, S = { + + } d3 = [ + +, neg] no change to S: S = { + + } specializations of G: G = {, +, } there is no element in S that is more specific than the first and third element of G remove them from G G = { + } Manuel Blum Machine Learning Lab, University of FreiburgMachine Learning: Exercise Sheet 2 Exercise 1: Version Spaces (6)

Task (c) foreach d D do loop continued... so far we have S = { + + } and G = { + } d 4 = [ + +, neg] no change to S: S = { + + } specializations of G: G = { + +, + } there is no element in S that is more specific than the second element of G remove it from G G = { + + } Note: At this point, the algorithm might be stopped, since S = G and no further changes to S and G are to be expected. However, by continuing we might detect inconsistencies in the training data. Manuel Blum Machine Learning Lab, University of FreiburgMachine Learning: Exercise Sheet 2 Exercise 1: Version Spaces (7)

Task (c) Start (init): G = { }, S = { } foreach d D do loop continued... d 5 = [ +, neg] Both, G = { + + } and S = { + + } are consistent with d 5. d6 = [, neg] Both, G = { + + } and S = { + + } are consistent with d 6. return S and G Manuel Blum Machine Learning Lab, University of FreiburgMachine Learning: Exercise Sheet 2 Exercise 1: Version Spaces (8)

Task (d) Does the order of presentation of the training examples to the learner affect the finally learned hypothesis? No, but it may influence the algorithm s running time. Manuel Blum Machine Learning Lab, University of FreiburgMachine Learning: Exercise Sheet 2 Exercise 1: Version Spaces (9)

Task (e) Assume a domain with two attributes, i.e. any instance is described by two constraints. How many positive and negative training examples are minimally required by the candidate elimination algorithm in order to learn an arbitrary concept? By learning an arbitrary concept, of course, we mean that the algorithm arrives at S = G. The algorithm is started with S = {, } and G = {, }. We just consider the best cases, i.e. situations in where the training instances given to the CE algorithm allow for adapting S or G. Manuel Blum Machine Learning Lab, University of FreiburgMachine Learning: Exercise Sheet 2 Exercise 1: Version Spaces (10)

Task (e) Clearly, three appropriately chosen examples are sufficient. Negative Examples: Change G from, to v, or, w. Or they change G from v, or, w to v, w. Positive Examples: Change S from,, v, w. Or they change S from v, w to v, or, w. Or from v, or, w to,. At least one positive example is required (otherwise S remains, ). Special case: Two positive patterns d 1, d 2, e 1, e 2 are sufficient, if it holds d 1 e 1 and d 2 e 2. S =, d 1, d 2, Manuel Blum Machine Learning Lab, University of FreiburgMachine Learning: Exercise Sheet 2 Exercise 1: Version Spaces (11)

Task (f) We are now extending the number of constraints used for describing training instances by one additional constraint named fever. We say that somebody is ill, if he has a running nose and is coughing (as we did before), or if he has fever. Training N C R F Classification Example (running nose) (coughing) (reddened skin) (fever) d 1 + + + positive (ill) d 2 + + positive (ill) d 3 + + positive (ill) d 4 + negative (healthy) d 5 negative (healthy) d 6 + + negative (healthy) Manuel Blum Machine Learning Lab, University of FreiburgMachine Learning: Exercise Sheet 2 Exercise 1: Version Spaces (12)

Task (f) How does the version space approach using the CE algorithm perform now, given the training examples specified on the previous slide? Initially: S = { }, G = { } d 1 = [ + + +, pos] S = { + + + }, G = { } d 2 = [ + +, pos] S = { + + }, G = { } d 3 = [ ++, pos] S = { }, G = { } We already arrive at S = G. d 4 = [ +, neg] S = { }, G = { } Now, S becomes empty since is inconsistent with d4 and is removed from S. G would be specialized to {, +, +, + }. But it is required that at least one element from S must be more specific than any element from G. This requirement cannot be fulfilled since S =. G = Manuel Blum Machine Learning Lab, University of FreiburgMachine Learning: Exercise Sheet 2 Exercise 1: Version Spaces (13)

Task (f) What happens, if the order of presentation of the training examples is altered? Even a change in the order of presentation does not result in yielding a learning success (i.e. in S = G ). When applying the CE algorithm, S and G become empty independent of the presentation order. Reason: The informally specified target concept of an ill person represents a disjunctive concept. The target concept is not an element of the hypothesis space H (which is made of conjunctive hypotheses). Manuel Blum Machine Learning Lab, University of FreiburgMachine Learning: Exercise Sheet 2 Exercise 1: Version Spaces (14)

Exercise 2: Decision Tree Learning with ID3 Task (a) Apply the ID3 algorithm to the training data in the table. Training fever vomiting diarrhea shivering Classification d 1 no no no no healthy (H) d 2 average no no no influenza (I) d 3 high no no yes influenza (I) d 4 high yes yes no salmonella poisoning (S) d 5 average no yes no salmonella poisoning (S) d 6 no yes yes no bowel inflammation (B) d 7 average yes yes no bowel inflammation (B) Manuel Blum Machine Learning Lab, University of Freiburg Machine Learning: Exercise Sheet 2 Exercise 2: ID3 (15)

Exercise 2: Decision Tree Learning with ID3 Task (a) Exemplary calculation for the first (root) node. entropy of the given data set S: Entropy(S) = 1 7 log 2( 1 7 ) 2 7 log 2( 2 7 ) 2 7 log 2( 2 7 ) 2 7 log 2( 2 7 ) = 1.950 consider attribute x= Fever Values H I S B Entropy(S i ) S 1 (no) * * [ 1 2, 0, 0, 1 2 ] 1 S 2 (average) * * * [0, 1 3, 1 3, 1 3 ] 1.585 S 3 (high) * * [0, 1 2, 1 2, 0] 1 Entropy(S Fever) = 2 7 1 + 3 7 1.585 + 2 7 1 = 1.251 Manuel Blum Machine Learning Lab, University of Freiburg Machine Learning: Exercise Sheet 2 Exercise 2: ID3 (16)

Exercise 3: Decision Tree Learning with ID3 Task (a) consider attribute x= Vomiting Values H I S B Entropy(S i ) S 1 (yes) * ** S 2 (no) * ** * [0, 0, 1, 2 ] 0.918 3 3 [ 1, 2, 1, 0] 1.5 4 4 4 Entropy(S Vomiting) = 3 0.918 + 4 1.5 = 1.251 7 7 consider attribute x= Diarrhea Values H I S B Entropy(S i ) S 1 (yes) ** ** S 2 (no) * ** [0, 0, 2, 2 ] 1 4 4 [ 1, 2, 0, 0] 0.918 3 3 Entropy(S Diarrhea) = 4 1 + 3 0.918 = 0.965 7 7 consider attribute x= Shivering Values H I S B Entropy(S i ) S 1 (yes) * [0, 0, 1, 0] 0 S 2 (no) * * ** ** [ 1, 1, 2, 2 ] 1.918 6 6 6 6 Entropy(S Shivering) = 1 0 + 6 1.918 = 1.644 7 7 Manuel Blum Machine Learning Lab, University of Freiburg Machine Learning: Exercise Sheet 2 Exercise 2: ID3 (17)

Exercise 3: Decision Tree Learning with ID3 Task (a) choose the attribute that maximizes the information gain Fever: Gain(S) = Ent(S) Ent(S Fever) = 1.95 1.251 = 0.699 Vomiting: Gain(S) = Ent(S) Ent(S Vomit) = 1.95 1.251 = 0.699 Diarrhea: Gain(S) = Ent(S) Ent(S Diarrh) = 1.95 0.965 = 0.985 Shivering: Gain(S) = Ent(S) Ent(S Shiver) = 1.95 1.644 = 0.306 Attribute Diarrhea is the most effective one, maximizing the information gain. Manuel Blum Machine Learning Lab, University of Freiburg Machine Learning: Exercise Sheet 2 Exercise 2: ID3 (18)

Exercise 3: Decision Tree Learning with ID3 Task (b) Does the resulting decision tree provide a disjoint definition of the classes? Yes, the resulting decision tree provides disjoint class definitions. Manuel Blum Machine Learning Lab, University of Freiburg Machine Learning: Exercise Sheet 2 Exercise 2: ID3 (19)

Exercise 3: Decision Tree Learning with ID3 Task (c) Consider the use of real-valued attributes, when learning decision trees, as described in the lecture. The data in the table below shows the relationship between the body height and the gender of a group of persons (the records have been sorted with respect to the value of height in cm). Height 161 164 169 175 176 179 180 184 185 Gender F F M M F F M M F Calculate the information gain for the potential splitting thresholds (recall that cut points must always lie at class boundaries) and determine the best one. Potential cut points must lie in the intervals (164, 169), (175, 176), (179, 180), or (184, 185). Manuel Blum Machine Learning Lab, University of Freiburg Machine Learning: Exercise Sheet 2 Exercise 2: ID3 (20)

Exercise 3: Decision Tree Learning with ID3 Task (c) Calculate the information gain for the potential splitting thresholds (ctd.). C 1 (164, 169) resulting class distribution: if x < C 1 then 2 0 else 3 4 conditional entropy: if x < C 1 then E = 0 else E = 3 7 log 2 3 7 4 7 log 2 4 7 = 0.985 entropy: E(C 1 S) = 2 9 0 + 7 0.985 = 0.766 9 C 2 (175, 176) resulting class distribution: if x < C 2 then 2 2 else 3 2 entropy: E(C 2 S) = 4 9 1 + 5 0.971 = 0.984 9 C 3 (179, 180) resulting class distribution: if x < C 3 then 4 2 else 1 2 entropy: E(C 3 S) = 6 9 0.918 + 3 0.918 = 0.918 9 Manuel Blum Machine Learning Lab, University of Freiburg Machine Learning: Exercise Sheet 2 Exercise 2: ID3 (21)

Exercise 3: Decision Tree Learning with ID3 Task (c) Calculate the information gain for the potential splitting thresholds (ctd.). C 4 (184, 185) resulting class distribution: if x < C 4 then 4 4 else 1 0 entropy: E(C 4 S) = 8 9 1 + 1 9 0 = 0.889 Prior entropy of S is 5 9 log 2 5 9 4 9 log 2 4 9 = 0.991. Information gain is Gain(S, C 1 ) = 0.225, Gain(S, C 2 ) = 0.007, Gain(S, C 3 ) = 0.073, and Gain(S, C 4 ) = 0.102 First splitting point (C 1 ) is the best one. Manuel Blum Machine Learning Lab, University of Freiburg Machine Learning: Exercise Sheet 2 Exercise 2: ID3 (22)