Introduction to Machine Learning

Introduction to Machine Learning Concept Learning Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB CSE 474/574 1 / 32

Outline Concept Learning Example Finding Malignant Tumors Notation Concept and Concept Space Learning a Possible Concept - Hypothesis Hypothesis Space Learning Conjunctive Concepts Find-S Algorithm Version Spaces LIST-THEN-ELIMINATE Algorithm Compressing Version Space Analyzing Candidate Elimination Algorithm Inductive Bias Chandola@UB CSE 474/574 2 / 32

Concept Learning Infer a boolean-valued function c : x {true,false} Input: Attributes for input x Output: true if input belongs to concept, else false Go from specific to general (Inductive Learning). Chandola@UB CSE 474/574 3 / 32

Finding Malignant Tumors from MRI Scans Attributes 1. Shape circular,oval 2. Size large,small 3. Color light,dark 4. Surface smooth,irregular 5. Thickness thin,thick Concept Malignant tumor. Chandola@UB CSE 474/574 4 / 32

Malignant vs. Benign Tumor Chandola@UB CSE 474/574 5 / 32

Malignant vs. Benign Tumor Malicious Malicious Benign Malicious Chandola@UB CSE 474/574 6 / 32

Notation Data, X - Set of all possible instances. What is X? Example: {circular,small,dark,smooth,thin} D - Training data set. D = { x, c(x) : x X, c(x) {0, 1}} Typically, D X Chandola@UB CSE 474/574 7 / 32

What is a Concept? Semantics: Malignant tumor Metastatic tumor Lymphoma Mathematics: A function c() What does c() do to data instances in X? Chandola@UB CSE 474/574 8 / 32

Learning a Concept - Hypothesis A conjunction over a subset of attributes A malignant tumor is: circular and dark and thick {circular,?,dark,?,thick} Target concept c is unknown Value of c over the training examples is known Chandola@UB CSE 474/574 9 / 32

Approximating Target Concept Through Hypothesis Hypothesis: a potential candidate for concept Example: {circular,?,?,?,?} Hypothesis Space (H): Set of all hypotheses What is H? Chandola@UB CSE 474/574 10 / 32

A Simple Algorithm (Find-S [1, Ch. 2]) 1. Start with h = 2. Use next input {x, c(x)} 3. If c(x) = 0, goto step 2 4. h h x (pairwise-and) 5. If more examples: Goto step 2 6. Stop Pairwise-and rules: a x : if a h = a a h a x = x : if a h = a x? : if a h a x? : if a h =? Chandola@UB CSE 474/574 11 / 32

Simple Example Target concept {?,large,?,?,thick} Chandola@UB CSE 474/574 12 / 32

Simple Example Target concept {?,large,?,?,thick} How many positive examples can there be? What is the minimum number of examples need to be seen to learn the concept? Chandola@UB CSE 474/574 12 / 32

Simple Example Target concept {?,large,?,?,thick} How many positive examples can there be? What is the minimum number of examples need to be seen to learn the concept? 1. {circular,large,light,smooth,thick}, malignant 2. {oval,large,dark,irregular,thick}, malignant Maximum? Chandola@UB CSE 474/574 12 / 32

Partial Training Data Target concept {?,large,?,?,thick} 1. {circular,large,light,smooth,thick}, malignant 2. {circular,large,light,irregular,thick}, malignant 3. {oval,large,dark,smooth,thin}, benign 4. {oval,large,light,irregular,thick}, malignant 5. {circular,small,light,smooth,thick}, benign Chandola@UB CSE 474/574 13 / 32

Recap of Find-S Objective: Find maximally specific hypothesis Admit all positive examples and nothing more Hypothesis never becomes any more specific Chandola@UB CSE 474/574 14 / 32

Recap of Find-S Objective: Find maximally specific hypothesis Admit all positive examples and nothing more Hypothesis never becomes any more specific Questions Does it converge to the target concept? Is the most specific hypothesis the best? Robustness to errors Choosing best among potentially many maximally specific hypotheses Chandola@UB CSE 474/574 14 / 32

Version Spaces 1. {circular,large,light,smooth,thick}, malignant 2. {circular,large,light,irregular,thick}, malignant 3. {oval,large,dark,smooth,thin}, benign 4. {oval,large,light,irregular,thick}, malignant 5. {circular,small,light,smooth,thin}, benign Hypothesis chosen by Find-S: {?,large,light,?,thick} Other possibilities that are consistent with the training data? What is consistency? Version space: Set of all consistent hypotheses. Chandola@UB CSE 474/574 15 / 32

List Then Eliminate 1. VS H 2. For Each x, c(x) D: Remove every hypothesis h from VS such that h(x) c(x) 3. Return VS Chandola@UB CSE 474/574 16 / 32

List Then Eliminate 1. VS H 2. For Each x, c(x) D: Remove every hypothesis h from VS such that h(x) c(x) 3. Return VS Issues? How many hypotheses are removed at every instance? Chandola@UB CSE 474/574 16 / 32

Compressing Version Space More General Than Relationship h j g h k if h k (x) = 1 h j (x) = 1 h j > g h k if (h j g h k ) (h k g h j ) In a version space, there are: 1. Maximally general hypotheses 2. Maximally specific hypotheses Boundaries of the version space Chandola@UB CSE 474/574 17 / 32

Example 1. {circular,large,light,smooth,thick}, malignant 2. {circular,large,light,irregular,thick}, malignant 3. {oval,large,dark,smooth,thin}, benign 4. {oval,large,light,irregular,thick}, malignant 5. {circular,small,light,smooth,thick}, benign {?,large,light,?,thick} {?,large,?,?,thick} {?,large,light,?,?} Chandola@UB CSE 474/574 18 / 32

Example (2) Specific General Chandola@UB CSE 474/574 19 / 32

Example (2) Specific {?,large,light,?,thick} General{?,large,?,?,thick} {?,large,light,?,?} Chandola@UB CSE 474/574 19 / 32

Boundaries are Enough to Capture Version Space Version Space Representation Theorem Every hypothesis h in the version space is contained within at least one pair of hypothesis, g and s, such that g G and s S, i.e.,: g g h g s Chandola@UB CSE 474/574 20 / 32

Candidate Elimination Algorithm 1. Initialize S 0 = { }, G 0 = {?,?,...,?} 2. For every training example, d = x, c(x) Chandola@UB CSE 474/574 21 / 32

Candidate Elimination Algorithm 1. Initialize S 0 = { }, G 0 = {?,?,...,?} 2. For every training example, d = x, c(x) c(x) = +ve 1. Remove from G any g for which g(x) +ve 2. For every s S such that s(x) +ve: 2.1 Remove s from S 2.2 For every minimal generalization, s of s If s (x) = +ve and there exists g G such that g > g s Add s to S 3. Remove from S all hypotheses that are more general than another hypothesis in S c(x) = ve 1. Remove from S any s for which s(x) ve 2. For every g G such that g(x) ve: 2.1 Remove g from G 2.2 For every minimal specialization, g of g If g (x) = ve and there exists s S such that g > g s Add g to G 3. Remove from G all hypotheses that are more specific than another hypothesis in G Chandola@UB CSE 474/574 21 / 32

Example SpecificS 0 { } General G 0 {?,?,?,?,?} Chandola@UB CSE 474/574 22 / 32

Example SpecificS 0 { } S 1 {ci,la,li,sh,th} {ci,la,li,sh,th}, +ve General G 0 {?,?,?,?,?} Chandola@UB CSE 474/574 22 / 32

Example SpecificS 0 { } S 1 {ci,la,li,sh,th} S 2 {ci,la,li,?,th} {ci,la,li,sh,th}, +ve {ci,la,li,ir,th}, +ve General G 0 {?,?,?,?,?} Chandola@UB CSE 474/574 22 / 32

Example SpecificS 0 { } S 1 {ci,la,li,sh,th} S 2 {ci,la,li,?,th} {ci,la,li,sh,th}, +ve {ci,la,li,ir,th}, +ve {ov,sm,li,sh,tn}, -ve G 3 {ci,?,?,?,?},{?,la,?,?,?},{?,?,dk,?,?},{?,?,?,ir,?},{?,?,?,?,th} General G 0 {?,?,?,?,?} Chandola@UB CSE 474/574 22 / 32

Example SpecificS 0 { } S 1 {ci,la,li,sh,th} S 2 {ci,la,li,?,th} S 4 {?,la,li,?,th} {ci,la,li,sh,th}, +ve {ci,la,li,ir,th}, +ve {ov,sm,li,sh,tn}, -ve {ov,la,li,ir,th}, +ve G 3 {ci,?,?,?,?},{?,la,?,?,?},{?,?,?,?,th} G 3 {ci,?,?,?,?},{?,la,?,?,?},{?,?,dk,?,?},{?,?,?,ir,?},{?,?,?,?,th} General G 0 {?,?,?,?,?} Chandola@UB CSE 474/574 22 / 32

Understanding Candidate Elimination S and G boundaries move towards each other Will it converge? 1. No errors in training examples 2. Sufficient training data 3. The target concept is in H Why is it better than Find-S? Chandola@UB CSE 474/574 23 / 32

Not Sufficient Training Examples Use boundary sets S and G to make predictions on a new instance x Case 1: x is consistent with every hypothesis in S Case 2: x is inconsistent with every hypothesis in G Chandola@UB CSE 474/574 24 / 32

Partially Learnt Concepts - Example {?,la,li,?,th} {?,la,?,?,?} {?,?,?,?,th} Chandola@UB CSE 474/574 25 / 32

Partially Learnt Concepts - Example {?,la,li,?,th} {?,la,li,?,?} {?,la,?,?,th} {?,?,li,?,th} {?,la,?,?,?} {?,?,?,?,th} Chandola@UB CSE 474/574 25 / 32

Partially Learnt Concepts - Example {?,la,li,?,th} {?,la,li,?,?} {?,la,?,?,th} {?,?,li,?,th} {?,la,?,?,?} {?,?,?,?,th} {ci,la,li,sh,th},? Chandola@UB CSE 474/574 25 / 32

Partially Learnt Concepts - Example {?,la,li,?,th} {?,la,li,?,?} {?,la,?,?,th} {?,?,li,?,th} {?,la,?,?,?} {?,?,?,?,th} {ov,sm,li,ir,tn},? Chandola@UB CSE 474/574 25 / 32

Partially Learnt Concepts - Example {?,la,li,?,th} {?,la,li,?,?} {?,la,?,?,th} {?,?,li,?,th} {?,la,?,?,?} {?,?,?,?,th} {ov,la,dk,ir,th},? Chandola@UB CSE 474/574 25 / 32

Partially Learnt Concepts - Example {?,la,li,?,th} {?,la,li,?,?} {?,la,?,?,th} {?,?,li,?,th} {?,la,?,?,?} {?,?,?,?,th} {ci,la,li,ir,tn},? Chandola@UB CSE 474/574 25 / 32

Using Partial Version Spaces Predict using the majority of concepts in the version space Predict using a randomly selected member of the version space Chandola@UB CSE 474/574 26 / 32

How many target concepts can there be? Target concept labels examples in X 2 X possibilities (C) X = d i=1 n i Conjunctive hypothesis space H has d i=1 n i + 1 possibilities Why is this difference? Chandola@UB CSE 474/574 27 / 32

Inductive Bias C H Chandola@UB CSE 474/574 28 / 32

Inductive Bias {ci,?,?,?,?,?} C H Chandola@UB CSE 474/574 28 / 32

Inductive Bias {ci,?,?,?,?,?} {?,?,?,?,?,th} {ci,?,?,?,?,?} C H Chandola@UB CSE 474/574 28 / 32

Bias Free Learning C H Simple tumor example: 2 attributes - size (sm/lg) and shape (ov/ci) Target label - malignant (+ve) or benign (-ve) X = 4 C = 16 Chandola@UB CSE 474/574 29 / 32

Bias Free Learning is Futile A learner making no assumption about target concept cannot classify any unseen instance Inductive Bias Set of assumptions made by a learner to generalize from training examples. Chandola@UB CSE 474/574 30 / 32

Examples of Inductive Bias Rote Learner No Bias Candidate Elimination Stronger Bias Find-S Strongest Bias Chandola@UB CSE 474/574 31 / 32

References T. M. Mitchell. Machine Learning. McGraw-Hill, Inc., New York, NY, USA, 1 edition, 1997. Chandola@UB CSE 474/574 32 / 32