Introduction to Machine Learning

Similar documents
Introduction to Machine Learning

Overview. Machine Learning, Chapter 2: Concept Learning

Concept Learning through General-to-Specific Ordering

Introduction to Machine Learning

[read Chapter 2] [suggested exercises 2.2, 2.3, 2.4, 2.6] General-to-specific ordering over hypotheses

Introduction to machine learning. Concept learning. Design of a learning system. Designing a learning system

Concept Learning Mitchell, Chapter 2. CptS 570 Machine Learning School of EECS Washington State University

Concept Learning. Space of Versions of Concepts Learned

Concept Learning. Berlin Chen Department of Computer Science & Information Engineering National Taiwan Normal University.

Outline. [read Chapter 2] Learning from examples. General-to-specic ordering over hypotheses. Version spaces and candidate elimination.

Concept Learning. Berlin Chen References: 1. Tom M. Mitchell, Machine Learning, Chapter 2 2. Tom M. Mitchell s teaching materials.

Lecture 2: Foundations of Concept Learning

BITS F464: MACHINE LEARNING

Question of the Day? Machine Learning 2D1431. Training Examples for Concept Enjoy Sport. Outline. Lecture 3: Concept Learning

Concept Learning.

Outline. Training Examples for EnjoySport. 2 lecture slides for textbook Machine Learning, c Tom M. Mitchell, McGraw Hill, 1997

Machine Learning 2D1431. Lecture 3: Concept Learning

Version Spaces.

Tutorial 6. By:Aashmeet Kalra

Computational Learning Theory

Introduction to Machine Learning

Introduction to Machine Learning

Notes on Machine Learning for and

EECS 349: Machine Learning Bryan Pardo

EECS 349: Machine Learning

Introduction to Machine Learning

Computational Learning Theory

Fundamentals of Concept Learning

Computational Learning Theory

Computational Learning Theory (COLT)

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Introduction to Machine Learning

Classification (Categorization) CS 391L: Machine Learning: Inductive Classification. Raymond J. Mooney. Sample Category Learning Problem

Lecture 3: Decision Trees

Lecture 25 of 42. PAC Learning, VC Dimension, and Mistake Bounds

Computational Learning Theory

Machine Learning (CS 567) Lecture 3

Question of the Day. Machine Learning 2D1431. Decision Tree for PlayTennis. Outline. Lecture 4: Decision Tree Learning

Machine Learning

CS 6375 Machine Learning

Lecture Notes in Machine Learning Chapter 4: Version space learning

A Tutorial on Computational Learning Theory Presented at Genetic Programming 1997 Stanford University, July 1997

Data Mining and Machine Learning

Machine Learning

Lecture 3: Decision Trees

Computational Learning Theory. Definitions

Computational Learning Theory

Introduction. Decision Tree Learning. Outline. Decision Tree 9/7/2017. Decision Tree Definition

Computational Learning Theory

Computational Learning Theory (VC Dimension)

Evaluation. Andrea Passerini Machine Learning. Evaluation

Decision Tree Learning

[Read Ch. 5] [Recommended exercises: 5.2, 5.3, 5.4]

Introduction to Machine Learning

Course 395: Machine Learning

Machine Learning. VC Dimension and Model Complexity. Eric Xing , Fall 2015

Evaluation requires to define performance measures to be optimized

Linear Classifiers: Expressiveness

Classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012

Introduction to Bayesian Learning. Machine Learning Fall 2018

Computational learning theory. PAC learning. VC dimension.

Hypothesis Evaluation

CSCE 478/878 Lecture 2: Concept Learning and the General-to-Specific Ordering

Decision Tree Learning and Inductive Inference

MODULE -4 BAYEIAN LEARNING

Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation

Machine Learning

Data Mining Part 4. Prediction

Machine Learning. Computational Learning Theory. Eric Xing , Fall Lecture 9, October 5, 2016

Machine Learning. Computational Learning Theory. Le Song. CSE6740/CS7641/ISYE6740, Fall 2012

2D1431 Machine Learning. Bagging & Boosting

Data Informatics. Seon Ho Kim, Ph.D.

Learning Theory. Machine Learning CSE546 Carlos Guestrin University of Washington. November 25, Carlos Guestrin

Active Learning and Optimized Information Gathering

COMPUTATIONAL INTELLIGENCE (INTRODUCTION TO MACHINE LEARNING) SS16

Bayesian Learning. CSL603 - Fall 2017 Narayanan C Krishnan

Machine Learning 2010

Machine Learning: Exercise Sheet 2

Data Mining Classification: Basic Concepts and Techniques. Lecture Notes for Chapter 3. Introduction to Data Mining, 2nd Edition

Is our computers learning?

Computational Learning Theory. CS534 - Machine Learning

CSCE 478/878 Lecture 6: Bayesian Learning

Computational Learning Theory. CS 486/686: Introduction to Artificial Intelligence Fall 2013

10-701/ Machine Learning, Fall

Machine Learning Alternatives to Manual Knowledge Acquisition

Linear Discrimination Functions

Machine Learning (CS 567) Lecture 2

Bayesian Learning Features of Bayesian learning methods:

COMPUTATIONAL LEARNING THEORY

Topics. Concept Learning. Concept Learning Task. Concept Descriptions

MACHINE LEARNING. Probably Approximately Correct (PAC) Learning. Alessandro Moschitti

Machine Learning

Learning Theory Continued

Hypothesis Testing and Computational Learning Theory. EECS 349 Machine Learning With slides from Bryan Pardo, Tom Mitchell

Machine Learning & Data Mining

ML techniques. symbolic techniques different types of representation value attribute representation representation of the first order

Decision trees. Special Course in Computer and Information Science II. Adam Gyenge Helsinki University of Technology

IFT Lecture 7 Elements of statistical learning theory

Decision Trees.

Online Learning, Mistake Bounds, Perceptron Algorithm

Transcription:

Introduction to Machine Learning Concept Learning Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB CSE 474/574 1 / 32

Outline Concept Learning Example Finding Malignant Tumors Notation Concept and Concept Space Learning a Possible Concept - Hypothesis Hypothesis Space Learning Conjunctive Concepts Find-S Algorithm Version Spaces LIST-THEN-ELIMINATE Algorithm Compressing Version Space Analyzing Candidate Elimination Algorithm Inductive Bias Chandola@UB CSE 474/574 2 / 32

Concept Learning Infer a boolean-valued function c : x {true,false} Input: Attributes for input x Output: true if input belongs to concept, else false Go from specific to general (Inductive Learning). Chandola@UB CSE 474/574 3 / 32

Finding Malignant Tumors from MRI Scans Attributes 1. Shape circular,oval 2. Size large,small 3. Color light,dark 4. Surface smooth,irregular 5. Thickness thin,thick Concept Malignant tumor. Chandola@UB CSE 474/574 4 / 32

Malignant vs. Benign Tumor Chandola@UB CSE 474/574 5 / 32

Malignant vs. Benign Tumor Malicious Malicious Benign Malicious Chandola@UB CSE 474/574 6 / 32

Malignant vs. Benign Tumor Malicious Malicious Benign Malicious Chandola@UB CSE 474/574 6 / 32

Notation Data, X - Set of all possible instances. What is X? Example: {circular,small,dark,smooth,thin} D - Training data set. D = { x, c(x) : x X, c(x) {0, 1}} Typically, D X Chandola@UB CSE 474/574 7 / 32

What is a Concept? Semantics: Malignant tumor Metastatic tumor Lymphoma Mathematics: A function c() What does c() do to data instances in X? Chandola@UB CSE 474/574 8 / 32

Learning a Concept - Hypothesis A conjunction over a subset of attributes A malignant tumor is: circular and dark and thick {circular,?,dark,?,thick} Target concept c is unknown Value of c over the training examples is known Chandola@UB CSE 474/574 9 / 32

Approximating Target Concept Through Hypothesis Hypothesis: a potential candidate for concept Example: {circular,?,?,?,?} Hypothesis Space (H): Set of all hypotheses What is H? Chandola@UB CSE 474/574 10 / 32

Approximating Target Concept Through Hypothesis Hypothesis: a potential candidate for concept Example: {circular,?,?,?,?} Hypothesis Space (H): Set of all hypotheses What is H? Special hypotheses: Accept everything, {?,?,?,?,?} Accept nothing, {,,,, } Chandola@UB CSE 474/574 10 / 32

A Simple Algorithm (Find-S [1, Ch. 2]) 1. Start with h = 2. Use next input {x, c(x)} 3. If c(x) = 0, goto step 2 4. h h x (pairwise-and) 5. If more examples: Goto step 2 6. Stop Pairwise-and rules: a x : if a h = a a h a x = x : if a h = a x? : if a h a x? : if a h =? Chandola@UB CSE 474/574 11 / 32

Simple Example Target concept {?,large,?,?,thick} Chandola@UB CSE 474/574 12 / 32

Simple Example Target concept {?,large,?,?,thick} How many positive examples can there be? What is the minimum number of examples need to be seen to learn the concept? Chandola@UB CSE 474/574 12 / 32

Simple Example Target concept {?,large,?,?,thick} How many positive examples can there be? What is the minimum number of examples need to be seen to learn the concept? 1. {circular,large,light,smooth,thick}, malignant 2. {oval,large,dark,irregular,thick}, malignant Maximum? Chandola@UB CSE 474/574 12 / 32

Partial Training Data Target concept {?,large,?,?,thick} 1. {circular,large,light,smooth,thick}, malignant 2. {circular,large,light,irregular,thick}, malignant 3. {oval,large,dark,smooth,thin}, benign 4. {oval,large,light,irregular,thick}, malignant 5. {circular,small,light,smooth,thick}, benign Chandola@UB CSE 474/574 13 / 32

Partial Training Data Target concept {?,large,?,?,thick} 1. {circular,large,light,smooth,thick}, malignant 2. {circular,large,light,irregular,thick}, malignant 3. {oval,large,dark,smooth,thin}, benign 4. {oval,large,light,irregular,thick}, malignant 5. {circular,small,light,smooth,thick}, benign Concept learnt: {?,large,light,?,thick} Chandola@UB CSE 474/574 13 / 32

Partial Training Data Target concept {?,large,?,?,thick} 1. {circular,large,light,smooth,thick}, malignant 2. {circular,large,light,irregular,thick}, malignant 3. {oval,large,dark,smooth,thin}, benign 4. {oval,large,light,irregular,thick}, malignant 5. {circular,small,light,smooth,thick}, benign Concept learnt: {?,large,light,?,thick} What mistake can this concept make? Chandola@UB CSE 474/574 13 / 32

Recap of Find-S Objective: Find maximally specific hypothesis Admit all positive examples and nothing more Hypothesis never becomes any more specific Chandola@UB CSE 474/574 14 / 32

Recap of Find-S Objective: Find maximally specific hypothesis Admit all positive examples and nothing more Hypothesis never becomes any more specific Questions Does it converge to the target concept? Is the most specific hypothesis the best? Robustness to errors Choosing best among potentially many maximally specific hypotheses Chandola@UB CSE 474/574 14 / 32

Version Spaces 1. {circular,large,light,smooth,thick}, malignant 2. {circular,large,light,irregular,thick}, malignant 3. {oval,large,dark,smooth,thin}, benign 4. {oval,large,light,irregular,thick}, malignant 5. {circular,small,light,smooth,thin}, benign Chandola@UB CSE 474/574 15 / 32

Version Spaces 1. {circular,large,light,smooth,thick}, malignant 2. {circular,large,light,irregular,thick}, malignant 3. {oval,large,dark,smooth,thin}, benign 4. {oval,large,light,irregular,thick}, malignant 5. {circular,small,light,smooth,thin}, benign Hypothesis chosen by Find-S: {?,large,light,?,thick} Other possibilities that are consistent with the training data? What is consistency? Version space: Set of all consistent hypotheses. Chandola@UB CSE 474/574 15 / 32

List Then Eliminate 1. VS H 2. For Each x, c(x) D: Remove every hypothesis h from VS such that h(x) c(x) 3. Return VS Chandola@UB CSE 474/574 16 / 32

List Then Eliminate 1. VS H 2. For Each x, c(x) D: Remove every hypothesis h from VS such that h(x) c(x) 3. Return VS Issues? How many hypotheses are removed at every instance? Chandola@UB CSE 474/574 16 / 32

Compressing Version Space More General Than Relationship h j g h k if h k (x) = 1 h j (x) = 1 h j > g h k if (h j g h k ) (h k g h j ) In a version space, there are: 1. Maximally general hypotheses 2. Maximally specific hypotheses Boundaries of the version space Chandola@UB CSE 474/574 17 / 32

Example 1. {circular,large,light,smooth,thick}, malignant 2. {circular,large,light,irregular,thick}, malignant 3. {oval,large,dark,smooth,thin}, benign 4. {oval,large,light,irregular,thick}, malignant 5. {circular,small,light,smooth,thick}, benign {?,large,light,?,thick} {?,large,?,?,thick} {?,large,light,?,?} Chandola@UB CSE 474/574 18 / 32

Example (2) Specific General Chandola@UB CSE 474/574 19 / 32

Example (2) Specific {?,large,light,?,thick} General{?,large,?,?,thick} {?,large,light,?,?} Chandola@UB CSE 474/574 19 / 32

Boundaries are Enough to Capture Version Space Version Space Representation Theorem Every hypothesis h in the version space is contained within at least one pair of hypothesis, g and s, such that g G and s S, i.e.,: g g h g s Chandola@UB CSE 474/574 20 / 32

Candidate Elimination Algorithm 1. Initialize S 0 = { }, G 0 = {?,?,...,?} 2. For every training example, d = x, c(x) Chandola@UB CSE 474/574 21 / 32

Candidate Elimination Algorithm 1. Initialize S 0 = { }, G 0 = {?,?,...,?} 2. For every training example, d = x, c(x) c(x) = +ve 1. Remove from G any g for which g(x) +ve 2. For every s S such that s(x) +ve: 2.1 Remove s from S 2.2 For every minimal generalization, s of s If s (x) = +ve and there exists g G such that g > g s Add s to S 3. Remove from S all hypotheses that are more general than another hypothesis in S Chandola@UB CSE 474/574 21 / 32

Candidate Elimination Algorithm 1. Initialize S 0 = { }, G 0 = {?,?,...,?} 2. For every training example, d = x, c(x) c(x) = +ve 1. Remove from G any g for which g(x) +ve 2. For every s S such that s(x) +ve: 2.1 Remove s from S 2.2 For every minimal generalization, s of s If s (x) = +ve and there exists g G such that g > g s Add s to S 3. Remove from S all hypotheses that are more general than another hypothesis in S c(x) = ve 1. Remove from S any s for which s(x) ve 2. For every g G such that g(x) ve: 2.1 Remove g from G 2.2 For every minimal specialization, g of g If g (x) = ve and there exists s S such that g > g s Add g to G 3. Remove from G all hypotheses that are more specific than another hypothesis in G Chandola@UB CSE 474/574 21 / 32

Example SpecificS 0 { } General G 0 {?,?,?,?,?} Chandola@UB CSE 474/574 22 / 32

Example SpecificS 0 { } S 1 {ci,la,li,sh,th} {ci,la,li,sh,th}, +ve General G 0 {?,?,?,?,?} Chandola@UB CSE 474/574 22 / 32

Example SpecificS 0 { } S 1 {ci,la,li,sh,th} S 2 {ci,la,li,?,th} {ci,la,li,sh,th}, +ve {ci,la,li,ir,th}, +ve General G 0 {?,?,?,?,?} Chandola@UB CSE 474/574 22 / 32

Example SpecificS 0 { } S 1 {ci,la,li,sh,th} S 2 {ci,la,li,?,th} {ci,la,li,sh,th}, +ve {ci,la,li,ir,th}, +ve {ov,sm,li,sh,tn}, -ve G 3 {ci,?,?,?,?},{?,la,?,?,?},{?,?,dk,?,?},{?,?,?,ir,?},{?,?,?,?,th} General G 0 {?,?,?,?,?} Chandola@UB CSE 474/574 22 / 32

Example SpecificS 0 { } S 1 {ci,la,li,sh,th} S 2 {ci,la,li,?,th} S 4 {?,la,li,?,th} {ci,la,li,sh,th}, +ve {ci,la,li,ir,th}, +ve {ov,sm,li,sh,tn}, -ve {ov,la,li,ir,th}, +ve G 3 {ci,?,?,?,?},{?,la,?,?,?},{?,?,?,?,th} G 3 {ci,?,?,?,?},{?,la,?,?,?},{?,?,dk,?,?},{?,?,?,ir,?},{?,?,?,?,th} General G 0 {?,?,?,?,?} Chandola@UB CSE 474/574 22 / 32

Understanding Candidate Elimination S and G boundaries move towards each other Will it converge? 1. No errors in training examples 2. Sufficient training data 3. The target concept is in H Why is it better than Find-S? Chandola@UB CSE 474/574 23 / 32

Not Sufficient Training Examples Use boundary sets S and G to make predictions on a new instance x Case 1: x is consistent with every hypothesis in S Case 2: x is inconsistent with every hypothesis in G Chandola@UB CSE 474/574 24 / 32

Partially Learnt Concepts - Example {?,la,li,?,th} {?,la,?,?,?} {?,?,?,?,th} Chandola@UB CSE 474/574 25 / 32

Partially Learnt Concepts - Example {?,la,li,?,th} {?,la,li,?,?} {?,la,?,?,th} {?,?,li,?,th} {?,la,?,?,?} {?,?,?,?,th} Chandola@UB CSE 474/574 25 / 32

Partially Learnt Concepts - Example {?,la,li,?,th} {?,la,li,?,?} {?,la,?,?,th} {?,?,li,?,th} {?,la,?,?,?} {?,?,?,?,th} {ci,la,li,sh,th},? Chandola@UB CSE 474/574 25 / 32

Partially Learnt Concepts - Example {?,la,li,?,th} {?,la,li,?,?} {?,la,?,?,th} {?,?,li,?,th} {?,la,?,?,?} {?,?,?,?,th} {ov,sm,li,ir,tn},? Chandola@UB CSE 474/574 25 / 32

Partially Learnt Concepts - Example {?,la,li,?,th} {?,la,li,?,?} {?,la,?,?,th} {?,?,li,?,th} {?,la,?,?,?} {?,?,?,?,th} {ov,la,dk,ir,th},? Chandola@UB CSE 474/574 25 / 32

Partially Learnt Concepts - Example {?,la,li,?,th} {?,la,li,?,?} {?,la,?,?,th} {?,?,li,?,th} {?,la,?,?,?} {?,?,?,?,th} {ci,la,li,ir,tn},? Chandola@UB CSE 474/574 25 / 32

Using Partial Version Spaces Predict using the majority of concepts in the version space Predict using a randomly selected member of the version space Chandola@UB CSE 474/574 26 / 32

How many target concepts can there be? Target concept labels examples in X 2 X possibilities (C) X = d i=1 n i Conjunctive hypothesis space H has d i=1 n i + 1 possibilities Why is this difference? Chandola@UB CSE 474/574 27 / 32

How many target concepts can there be? Target concept labels examples in X 2 X possibilities (C) X = d i=1 n i Conjunctive hypothesis space H has d i=1 n i + 1 possibilities Why is this difference? Hypothesis Assumption Target concept is conjunctive. Chandola@UB CSE 474/574 27 / 32

Inductive Bias C H Chandola@UB CSE 474/574 28 / 32

Inductive Bias {ci,?,?,?,?,?} C H Chandola@UB CSE 474/574 28 / 32

Inductive Bias {ci,?,?,?,?,?} {?,?,?,?,?,th} {ci,?,?,?,?,?} C H Chandola@UB CSE 474/574 28 / 32

Bias Free Learning C H Simple tumor example: 2 attributes - size (sm/lg) and shape (ov/ci) Target label - malignant (+ve) or benign (-ve) X = 4 C = 16 Chandola@UB CSE 474/574 29 / 32

Bias Free Learning is Futile A learner making no assumption about target concept cannot classify any unseen instance Inductive Bias Set of assumptions made by a learner to generalize from training examples. Chandola@UB CSE 474/574 30 / 32

Examples of Inductive Bias Rote Learner No Bias Candidate Elimination Stronger Bias Find-S Strongest Bias Chandola@UB CSE 474/574 31 / 32

References T. M. Mitchell. Machine Learning. McGraw-Hill, Inc., New York, NY, USA, 1 edition, 1997. Chandola@UB CSE 474/574 32 / 32