Linear Classifiers: Expressiveness

Size: px

Start display at page:

Download "Linear Classifiers: Expressiveness"

Raymond Magnus Wells
5 years ago
Views:

1 Linear Classifiers: Expressiveness Machine Learning Spring 2018 The slides are mainly from Vivek Srikumar 1

2 Lecture outline Linear classifiers: Introduction What functions do linear classifiers express? Least Squares Method for Regression 2

3 Where are we? Linear classifiers: Introduction What functions do linear classifiers express? Conjunctions and disjunctions mofn functions Not all functions are linearly separable Feature space transformations Exercises Least Squares Method for Regression 3

4 Which Boolean functions can linear classifiers represent? Linear classifiers are an expressive hypothesis class Many Boolean functions are linearly separable Not all though Recall: Decision trees can represent any Boolean function 4

5 Conjunctions and disjunctions y = x 1 x 2 x 3 x 1 x 2 x 3 y

6 Conjunctions and disjunctions y = x 1 x 2 x 3 is equivalent to y = 1 if x 1 x 2 x 3 3 x 1 x 2 x 3 y

7 Conjunctions and disjunctions y = x 1 x 2 x 3 is equivalent to y = 1 if x 1 x 2 x 3 3 x 1 x 2 x 3 y x 1 x 2 x 3 3 sign

8 Conjunctions and disjunctions y = x 1 x 2 x 3 is equivalent to y = 1 if x 1 x 2 x 3 3 x 1 x 2 x 3 y x 1 x 2 x 3 3 sign How about negations? y = x 1 x 2 x 3 8

9 Conjunctions and disjunctions y = x 1 x 2 x 3 is equivalent to y = 1 if x 1 x 2 x 3 3 x 1 x 2 x 3 y x 1 x 2 x 3 3 sign Negations are okay too. In general, use 1x in the linear threshold unit if x is negated y = x 1 x 2 x 3 is equivalent to y = 1 if x 1 x 2 x 3 2 9

10 Conjunctions and disjunctions y = x 1 x 2 x 3 is equivalent to y = 1 if x 1 x 2 x 3 3 x 1 x 2 x 3 y x 1 x 2 x 3 3 sign Negations are okay too. In general, use 1x in the linear threshold unit if x is negated y = x 1 x 2 x 3 is equivalent to y = 1 if x 1 x 2 x 3 2 Exercise: What would the linear threshold function be if the conjunctions here were replaced with disjunctions? 10

11 Conjunctions and disjunctions y = x 1 x 2 x 3 is equivalent to y = 1 if x 1 x 2 x 3 3 x 1 x 2 x 3 y x 1 x 2 x 3 3 sign Questions? Negations are okay too. In general, use 1x in the linear threshold unit if x is negated y = x 1 x 2 x 3 is equivalent to y = 1 if x 1 x 2 x 3 2 Exercise: What would the linear threshold function be if the conjunctions here were replaced with disjunctions? 11

12 mofn functions mofn rules There is a fixed set of n variables y = true if, and only if, at least m of them are true All other variables are ignored Suppose there are five Boolean variables: x 1, x 2, x 3, x 4, x 5 What is a linear threshold unit that is equivalent to the classification rule at least 2 of {x 1, x 2, x 3 }? 12

13 mofn functions mofn rules There is a fixed set of n variables y = true if, and only if, at least m of them are true All other variables are ignored Suppose there are five Boolean variables: x 1, x 2, x 3, x 4, x 5 What is a linear threshold unit that is equivalent to the classification rule at least 2 of {x 1, x 2, x 3 }? x 1 x 2 x

14 mofn functions mofn rules There is a fixed set of n variables y = true if, and only if, at least m of them are true All other variables are ignored Suppose there are five Boolean variables: x 1, x 2, x 3, x 4, x 5 What is a linear threshold unit that is equivalent to the classification rule at least 2 of {x 1, x 2, x 3 }? x 1 x 2 x 3 2 Questions? 14

15 Not all functions are linearly separable Parity is not linearly separable Can t draw a line to separate the two classes x 1 x 2 Questions? 15

16 Not all functions are linearly separable XOR is not linear y = (x 1 x 2 ) ( x 1 x 2 ) Parity cannot be represented as a linear classifier f(x) = 1 if the number of 1 s is even Many nontrivial Boolean functions y = (x 1 x 2 ) (x 3 x 4 ) The function is not linear in the four variables 16

17 Even these functions can be made linear These points are not separable in 1dimension by a line What is a onedimensional line, by the way? The trick: Change the representation 17

18 The blown up feature space The trick: Use feature conjunctions Transform points: Represent each point x in 2 dimensions by (x, x 2 ) 18

19 The blown up feature space The trick: Use feature conjunctions Transform points: Represent each point x in 2 dimensions by (x, x 2 ) 19

20 The blown up feature space The trick: Use feature conjunctions Transform points: Represent each point x in 2 dimensions by (x, x 2 ) Now the data is linearly separable in this space! 20

21 The blown up feature space The trick: Use feature conjunctions Transform points: Represent each point x in 2 dimensions by (x, x 2 ) (2, 4) 2 Now the data is linearly separable in this space! 21

22 Exercise How would you use the feature transformation idea to make XOR in two dimensions linearly separable in a new space? To answer this question, you need to think about a function that maps examples from two dimensional space to a higher dimensional space. 22

23 Almost linearly separable data sgn(b w 1 x 1 w 2 x 2 ) Training data is almost separable, except for some noise How much noise do we allow for? x 1 x 2 23

24 Almost linearly separable data b w 1 x 1 w 2 x 2 =0 sgn(b w 1 x 1 w 2 x 2 ) Training data is almost separable, except for some noise How much noise do we allow for? x 1 x 2 24

25 Linear classifiers: An expressive hypothesis class Many functions are linear Often a good guess for a hypothesis space Some functions are not linear The XOR function Nontrivial Boolean functions But there are ways of making them linear in a higher dimensional feature space 25

26 Why is the bias term needed? b w 1 x 1 w 2 x 2 =0 x 1 x 2 26

27 Why is the bias term needed? If b is zero, then we are restricting the learner only to hyperplanes that go through the origin May not be expressive enough x 1 x 2 27

28 Why is the bias term needed? w 1 x 1 w 2 x 2 =0 If b is zero, then we are restricting the learner only to hyperplanes that go through the origin May not be expressive enough x 1 x 2 28

The Perceptron Algorithm

The Perceptron Algorithm Machine Learning Spring 2018 The slides are mainly from Vivek Srikumar 1 Outline The Perceptron Algorithm Perceptron Mistake Bound Variants of Perceptron 2 Where are we? The Perceptron