Introduction to Machine Learning

Size: px

Start display at page:

Download "Introduction to Machine Learning"

Jewel Dawson
6 years ago
Views:

State University of New York at Buffalo Buffalo,

1 Introduction to Machine Learning Concept Learning Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA CSE 474/574 1 / 32

2 Outline Concept Learning Example Finding Malignant Tumors Notation Concept and Concept Space Learning a Possible Concept - Hypothesis Hypothesis Space Learning Conjunctive Concepts Find-S Algorithm Version Spaces LIST-THEN-ELIMINATE Algorithm Compressing Version Space Analyzing Candidate Elimination Algorithm Inductive Bias Chandola@UB CSE 474/574 2 / 32

3 Concept Learning Infer a boolean-valued function c : x {true,false} Input: Attributes for input x Output: true if input belongs to concept, else false Go from specific to general (Inductive Learning). Chandola@UB CSE 474/574 3 / 32

4 Finding Malignant Tumors from MRI Scans Attributes 1. Shape circular,oval 2. Size large,small 3. Color light,dark 4. Surface smooth,irregular 5. Thickness thin,thick Concept Malignant tumor. CSE 474/574 4 / 32

5 Malignant vs. Benign Tumor CSE 474/574 5 / 32

6 Malignant vs. Benign Tumor Malicious Malicious Benign Malicious CSE 474/574 6 / 32

7 Malignant vs. Benign Tumor Malicious Malicious Benign Malicious CSE 474/574 6 / 32

8 Notation Data, X - Set of all possible instances. What is X? Example: {circular,small,dark,smooth,thin} D - Training data set. D = { x, c(x) : x X, c(x) {0, 1}} Typically, D X Chandola@UB CSE 474/574 7 / 32

9 What is a Concept? Semantics: Malignant tumor Metastatic tumor Lymphoma Mathematics: A function c() What does c() do to data instances in X? Chandola@UB CSE 474/574 8 / 32

10 Learning a Concept - Hypothesis A conjunction over a subset of attributes A malignant tumor is: circular and dark and thick {circular,?,dark,?,thick} Target concept c is unknown Value of c over the training examples is known Chandola@UB CSE 474/574 9 / 32

11 Approximating Target Concept Through Hypothesis Hypothesis: a potential candidate for concept Example: {circular,?,?,?,?} Hypothesis Space (H): Set of all hypotheses What is H? Chandola@UB CSE 474/ / 32

12 Approximating Target Concept Through Hypothesis Hypothesis: a potential candidate for concept Example: {circular,?,?,?,?} Hypothesis Space (H): Set of all hypotheses What is H? Special hypotheses: Accept everything, {?,?,?,?,?} Accept nothing, {,,,, } Chandola@UB CSE 474/ / 32

13 A Simple Algorithm (Find-S [1, Ch. 2]) 1. Start with h = 2. Use next input {x, c(x)} 3. If c(x) = 0, goto step 2 4. h h x (pairwise-and) 5. If more examples: Goto step 2 6. Stop Pairwise-and rules: a x : if a h = a a h a x = x : if a h = a x? : if a h a x? : if a h =? Chandola@UB CSE 474/ / 32

14 Simple Example Target concept {?,large,?,?,thick} CSE 474/ / 32

15 Simple Example Target concept {?,large,?,?,thick} How many positive examples can there be? What is the minimum number of examples need to be seen to learn the concept? CSE 474/ / 32

16 Simple Example Target concept {?,large,?,?,thick} How many positive examples can there be? What is the minimum number of examples need to be seen to learn the concept? 1. {circular,large,light,smooth,thick}, malignant 2. {oval,large,dark,irregular,thick}, malignant Maximum? CSE 474/ / 32

17 Partial Training Data Target concept {?,large,?,?,thick} 1. {circular,large,light,smooth,thick}, malignant 2. {circular,large,light,irregular,thick}, malignant 3. {oval,large,dark,smooth,thin}, benign 4. {oval,large,light,irregular,thick}, malignant 5. {circular,small,light,smooth,thick}, benign CSE 474/ / 32

18 Partial Training Data Target concept {?,large,?,?,thick} 1. {circular,large,light,smooth,thick}, malignant 2. {circular,large,light,irregular,thick}, malignant 3. {oval,large,dark,smooth,thin}, benign 4. {oval,large,light,irregular,thick}, malignant 5. {circular,small,light,smooth,thick}, benign Concept learnt: {?,large,light,?,thick} CSE 474/ / 32

19 Partial Training Data Target concept {?,large,?,?,thick} 1. {circular,large,light,smooth,thick}, malignant 2. {circular,large,light,irregular,thick}, malignant 3. {oval,large,dark,smooth,thin}, benign 4. {oval,large,light,irregular,thick}, malignant 5. {circular,small,light,smooth,thick}, benign Concept learnt: {?,large,light,?,thick} What mistake can this concept make? CSE 474/ / 32

20 Recap of Find-S Objective: Find maximally specific hypothesis Admit all positive examples and nothing more Hypothesis never becomes any more specific CSE 474/ / 32

21 Recap of Find-S Objective: Find maximally specific hypothesis Admit all positive examples and nothing more Hypothesis never becomes any more specific Questions Does it converge to the target concept? Is the most specific hypothesis the best? Robustness to errors Choosing best among potentially many maximally specific hypotheses CSE 474/ / 32

22 Version Spaces 1. {circular,large,light,smooth,thick}, malignant 2. {circular,large,light,irregular,thick}, malignant 3. {oval,large,dark,smooth,thin}, benign 4. {oval,large,light,irregular,thick}, malignant 5. {circular,small,light,smooth,thin}, benign CSE 474/ / 32

23 Version Spaces 1. {circular,large,light,smooth,thick}, malignant 2. {circular,large,light,irregular,thick}, malignant 3. {oval,large,dark,smooth,thin}, benign 4. {oval,large,light,irregular,thick}, malignant 5. {circular,small,light,smooth,thin}, benign Hypothesis chosen by Find-S: {?,large,light,?,thick} Other possibilities that are consistent with the training data? What is consistency? Version space: Set of all consistent hypotheses. CSE 474/ / 32

24 List Then Eliminate 1. VS H 2. For Each x, c(x) D: Remove every hypothesis h from VS such that h(x) c(x) 3. Return VS Chandola@UB CSE 474/ / 32

25 List Then Eliminate 1. VS H 2. For Each x, c(x) D: Remove every hypothesis h from VS such that h(x) c(x) 3. Return VS Issues? How many hypotheses are removed at every instance? Chandola@UB CSE 474/ / 32

26 Compressing Version Space More General Than Relationship h j g h k if h k (x) = 1 h j (x) = 1 h j > g h k if (h j g h k ) (h k g h j ) In a version space, there are: 1. Maximally general hypotheses 2. Maximally specific hypotheses Boundaries of the version space Chandola@UB CSE 474/ / 32

27 Example 1. {circular,large,light,smooth,thick}, malignant 2. {circular,large,light,irregular,thick}, malignant 3. {oval,large,dark,smooth,thin}, benign 4. {oval,large,light,irregular,thick}, malignant 5. {circular,small,light,smooth,thick}, benign {?,large,light,?,thick} {?,large,?,?,thick} {?,large,light,?,?} CSE 474/ / 32

28 Example (2) Specific General CSE 474/ / 32

29 Example (2) Specific {?,large,light,?,thick} General{?,large,?,?,thick} {?,large,light,?,?} CSE 474/ / 32

30 Boundaries are Enough to Capture Version Space Version Space Representation Theorem Every hypothesis h in the version space is contained within at least one pair of hypothesis, g and s, such that g G and s S, i.e.,: g g h g s Chandola@UB CSE 474/ / 32

31 Candidate Elimination Algorithm 1. Initialize S 0 = { }, G 0 = {?,?,...,?} 2. For every training example, d = x, c(x) Chandola@UB CSE 474/ / 32

32 Candidate Elimination Algorithm 1. Initialize S 0 = { }, G 0 = {?,?,...,?} 2. For every training example, d = x, c(x) c(x) = +ve 1. Remove from G any g for which g(x) +ve 2. For every s S such that s(x) +ve: 2.1 Remove s from S 2.2 For every minimal generalization, s of s If s (x) = +ve and there exists g G such that g > g s Add s to S 3. Remove from S all hypotheses that are more general than another hypothesis in S Chandola@UB CSE 474/ / 32

33 Candidate Elimination Algorithm 1. Initialize S 0 = { }, G 0 = {?,?,...,?} 2. For every training example, d = x, c(x) c(x) = +ve 1. Remove from G any g for which g(x) +ve 2. For every s S such that s(x) +ve: 2.1 Remove s from S 2.2 For every minimal generalization, s of s If s (x) = +ve and there exists g G such that g > g s Add s to S 3. Remove from S all hypotheses that are more general than another hypothesis in S c(x) = ve 1. Remove from S any s for which s(x) ve 2. For every g G such that g(x) ve: 2.1 Remove g from G 2.2 For every minimal specialization, g of g If g (x) = ve and there exists s S such that g > g s Add g to G 3. Remove from G all hypotheses that are more specific than another hypothesis in G Chandola@UB CSE 474/ / 32

34 Example SpecificS 0 { } General G 0 {?,?,?,?,?} Chandola@UB CSE 474/ / 32

35 Example SpecificS 0 { } S 1 {ci,la,li,sh,th} {ci,la,li,sh,th}, +ve General G 0 {?,?,?,?,?} Chandola@UB CSE 474/ / 32

36 Example SpecificS 0 { } S 1 {ci,la,li,sh,th} S 2 {ci,la,li,?,th} {ci,la,li,sh,th}, +ve {ci,la,li,ir,th}, +ve General G 0 {?,?,?,?,?} Chandola@UB CSE 474/ / 32

37 Example SpecificS 0 { } S 1 {ci,la,li,sh,th} S 2 {ci,la,li,?,th} {ci,la,li,sh,th}, +ve {ci,la,li,ir,th}, +ve {ov,sm,li,sh,tn}, -ve G 3 {ci,?,?,?,?},{?,la,?,?,?},{?,?,dk,?,?},{?,?,?,ir,?},{?,?,?,?,th} General G 0 {?,?,?,?,?} Chandola@UB CSE 474/ / 32

38 Example SpecificS 0 { } S 1 {ci,la,li,sh,th} S 2 {ci,la,li,?,th} S 4 {?,la,li,?,th} {ci,la,li,sh,th}, +ve {ci,la,li,ir,th}, +ve {ov,sm,li,sh,tn}, -ve {ov,la,li,ir,th}, +ve G 3 {ci,?,?,?,?},{?,la,?,?,?},{?,?,?,?,th} G 3 {ci,?,?,?,?},{?,la,?,?,?},{?,?,dk,?,?},{?,?,?,ir,?},{?,?,?,?,th} General G 0 {?,?,?,?,?} Chandola@UB CSE 474/ / 32

39 Understanding Candidate Elimination S and G boundaries move towards each other Will it converge? 1. No errors in training examples 2. Sufficient training data 3. The target concept is in H Why is it better than Find-S? Chandola@UB CSE 474/ / 32

40 Not Sufficient Training Examples Use boundary sets S and G to make predictions on a new instance x Case 1: x is consistent with every hypothesis in S Case 2: x is inconsistent with every hypothesis in G Chandola@UB CSE 474/ / 32

41 Partially Learnt Concepts - Example {?,la,li,?,th} {?,la,?,?,?} {?,?,?,?,th} Chandola@UB CSE 474/ / 32

42 Partially Learnt Concepts - Example {?,la,li,?,th} {?,la,li,?,?} {?,la,?,?,th} {?,?,li,?,th} {?,la,?,?,?} {?,?,?,?,th} Chandola@UB CSE 474/ / 32

43 Partially Learnt Concepts - Example {?,la,li,?,th} {?,la,li,?,?} {?,la,?,?,th} {?,?,li,?,th} {?,la,?,?,?} {?,?,?,?,th} {ci,la,li,sh,th},? Chandola@UB CSE 474/ / 32

44 Partially Learnt Concepts - Example {?,la,li,?,th} {?,la,li,?,?} {?,la,?,?,th} {?,?,li,?,th} {?,la,?,?,?} {?,?,?,?,th} {ov,sm,li,ir,tn},? Chandola@UB CSE 474/ / 32

45 Partially Learnt Concepts - Example {?,la,li,?,th} {?,la,li,?,?} {?,la,?,?,th} {?,?,li,?,th} {?,la,?,?,?} {?,?,?,?,th} {ov,la,dk,ir,th},? Chandola@UB CSE 474/ / 32

46 Partially Learnt Concepts - Example {?,la,li,?,th} {?,la,li,?,?} {?,la,?,?,th} {?,?,li,?,th} {?,la,?,?,?} {?,?,?,?,th} {ci,la,li,ir,tn},? Chandola@UB CSE 474/ / 32

47 Using Partial Version Spaces Predict using the majority of concepts in the version space Predict using a randomly selected member of the version space Chandola@UB CSE 474/ / 32

48 How many target concepts can there be? Target concept labels examples in X 2 X possibilities (C) X = d i=1 n i Conjunctive hypothesis space H has d i=1 n i + 1 possibilities Why is this difference? Chandola@UB CSE 474/ / 32

49 How many target concepts can there be? Target concept labels examples in X 2 X possibilities (C) X = d i=1 n i Conjunctive hypothesis space H has d i=1 n i + 1 possibilities Why is this difference? Hypothesis Assumption Target concept is conjunctive. Chandola@UB CSE 474/ / 32

50 Inductive Bias C H Chandola@UB CSE 474/ / 32

51 Inductive Bias {ci,?,?,?,?,?} C H Chandola@UB CSE 474/ / 32

52 Inductive Bias {ci,?,?,?,?,?} {?,?,?,?,?,th} {ci,?,?,?,?,?} C H Chandola@UB CSE 474/ / 32

53 Bias Free Learning C H Simple tumor example: 2 attributes - size (sm/lg) and shape (ov/ci) Target label - malignant (+ve) or benign (-ve) X = 4 C = 16 Chandola@UB CSE 474/ / 32

54 Bias Free Learning is Futile A learner making no assumption about target concept cannot classify any unseen instance Inductive Bias Set of assumptions made by a learner to generalize from training examples. Chandola@UB CSE 474/ / 32

55 Examples of Inductive Bias Rote Learner No Bias Candidate Elimination Stronger Bias Find-S Strongest Bias CSE 474/ / 32

56 References T. M. Mitchell. Machine Learning. McGraw-Hill, Inc., New York, NY, USA, 1 edition, Chandola@UB CSE 474/ / 32

Introduction to Machine Learning

Outline Contents Introduction to Machine Learning Concept Learning Varun Chandola February 2, 2018 1 Concept Learning 1 1.1 Example Finding Malignant Tumors............. 2 1.2 Notation..............................