SVM. Introduction to Data Science Algorithms Jordan Boyd-Graber and Michael Paul SLIDES ADAPTED FROM HINRICH SCHÜTZE

Size: px

Start display at page:

Download "SVM. Introduction to Data Science Algorithms Jordan Boyd-Graber and Michael Paul SLIDES ADAPTED FROM HINRICH SCHÜTZE"

Deborah Collins
5 years ago
Views:

1 SVM Introduction to Data Science Algorithms Jordan Boyd-Graber and Michael Paul SLIDES ADAPTED FROM HINRICH SCHÜTZE Introduction to Data Science Algorithms Boyd-Graber and Paul SVM 1 of 11

2 Find the maximum margin hyperplane Introduction to Data Science Algorithms Boyd-Graber and Paul SVM 2 of 11

3 Find the maximum margin hyperplane 3 2 Which are the support vectors? Introduction to Data Science Algorithms Boyd-Graber and Paul SVM 2 of 11

4 Walkthrough example: building an SVM over the data shown Working geometrically: Introduction to Data Science Algorithms Boyd-Graber and Paul SVM 3 of 11

5 Walkthrough example: building an SVM over the data shown Working geometrically: If you got 0 =.5x + y 2.75, close! Introduction to Data Science Algorithms Boyd-Graber and Paul SVM 3 of 11

6 Walkthrough example: building an SVM over the data shown Working geometrically: If you got 0 =.5x + y 2.75, close! Set up system of equations 3 w 1 + w 2 + b = 1 (1) 2 w 1 + 2w 2 + b = 0 (2) 2w 1 + 3w 2 + b = +1 (3) Introduction to Data Science Algorithms Boyd-Graber and Paul SVM 3 of 11

7 Walkthrough example: building an SVM over the data shown Working geometrically: If you got 0 =.5x + y 2.75, close! Set up system of equations 3 w 1 + w 2 + b = 1 (1) 2 w 1 + 2w 2 + b = 0 (2) 2w 1 + 3w 2 + b = +1 (3) The SVM decision boundary is: = 2 5 x y 11 5 Introduction to Data Science Algorithms Boyd-Graber and Paul SVM 3 of 11

8 Cannonical Form w 1 x 1 + w 2 x 2 + b Introduction to Data Science Algorithms Boyd-Graber and Paul SVM 4 of 11

9 Cannonical Form x 1 +.8x Introduction to Data Science Algorithms Boyd-Graber and Paul SVM 4 of 11

10 Cannonical Form x 1 +.8x = = = Introduction to Data Science Algorithms Boyd-Graber and Paul SVM 4 of 11

11 What s the margin? Distance to closest point Introduction to Data Science Algorithms Boyd-Graber and Paul SVM 5 of 11

12 What s the margin? Distance to closest point (2 1) 2 = 5 2 (4) Introduction to Data Science Algorithms Boyd-Graber and Paul SVM 5 of 11

13 What s the margin? Distance to closest point (2 1) 2 = 5 2 (4) Weight vector Introduction to Data Science Algorithms Boyd-Graber and Paul SVM 5 of 11

14 What s the margin? Distance to closest point (2 1) 2 = 5 2 (4) Weight vector 1 w = = 1 = 5 5 = (5) Introduction to Data Science Algorithms Boyd-Graber and Paul SVM 5 of 11

15 Slack Example Decision function: 1 4 w = 1 ;b = D C B A H E G I F Decision Boundary Introduction to Data Science Algorithms Boyd-Graber and Paul SVM 6 of 11

16 Slack Example Decision function: 1 4 w = 1 ;b = D What are the support vectors? C B A H E G I F Decision Boundary Introduction to Data Science Algorithms Boyd-Graber and Paul SVM 6 of 11

17 Slack Example Decision function: 1 4 w = 1 ;b = D What are the support vectors? Which have non-zero slack? C B A Decision Boundary H E G I F Introduction to Data Science Algorithms Boyd-Graber and Paul SVM 6 of 11

18 Slack Example Decision function: 1 4 w = 1 ;b = D What are the support vectors? Which have non-zero slack? C B A Decision Boundary H E G I F Compute ξ B,ξ E Introduction to Data Science Algorithms Boyd-Graber and Paul SVM 6 of 11

19 Computing slack y i ( w i x i + b) 1 ξ i (6) Introduction to Data Science Algorithms Boyd-Graber and Paul SVM 7 of 11

20 Computing slack y i ( w i x i + b) 1 ξ i (6) Point B y B ( w B x B + b) = (7) 1( ) = 1.25 (8) Thus, ξ B = 2.25 Introduction to Data Science Algorithms Boyd-Graber and Paul SVM 7 of 11

21 Computing slack y i ( w i x i + b) 1 ξ i (6) Point B y B ( w B x B + b) = (7) 1( ) = 1.25 (8) Thus, ξ B = 2.25 Point E y E ( w E x E + b) = (9) 1( ) = 1 (10) Thus, ξ E = 2 Introduction to Data Science Algorithms Boyd-Graber and Paul SVM 7 of 11

22 Slack Example Decision function: 0 w = ;b = 5 2 D E C H F B A Decision Boundary G I Introduction to Data Science Algorithms Boyd-Graber and Paul SVM 8 of 11

23 Slack Example Decision function: 0 w = ;b = 5 2 D What are the support vectors? C H E F B G A Decision Boundary I Introduction to Data Science Algorithms Boyd-Graber and Paul SVM 8 of 11

24 Slack Example Decision function: 0 w = ;b = 5 2 D What are the support vectors? C H E F Which have non-zero slack? B A Decision Boundary G I Introduction to Data Science Algorithms Boyd-Graber and Paul SVM 8 of 11

25 Slack Example Decision function: 0 w = ;b = 5 2 D What are the support vectors? C H E F Which have non-zero slack? B A Decision Boundary G I Compute ξ A,ξ C Introduction to Data Science Algorithms Boyd-Graber and Paul SVM 8 of 11

26 Computing slack y i ( w i x i + b) 1 ξ i (11) Introduction to Data Science Algorithms Boyd-Graber and Paul SVM 9 of 11

27 Computing slack y i ( w i x i + b) 1 ξ i (11) Point A y A ( w A x A + b) = (12) 1( ) = 5 (13) Thus, ξ A = 6 Introduction to Data Science Algorithms Boyd-Graber and Paul SVM 9 of 11

28 Computing slack y i ( w i x i + b) 1 ξ i (11) Point A y A ( w A x A + b) = (12) 1( ) = 5 (13) Thus, ξ A = 6 Point C y C ( w C x C + b) = (14) 1( ) = 1 (15) Thus, ξ C = 2 Introduction to Data Science Algorithms Boyd-Graber and Paul SVM 9 of 11

29 Which one is better? D D C B A H E F G I C E H F B G A Decision Boundary Decision Boundary I Which decision boundary (wide / narrow) has the better objective? Introduction to Data Science Algorithms Boyd-Graber and Paul SVM 10 of 11

30 Which one is better? D D C B A H E F G I C E H F B G Decision Boundary A Decision Boundary Which decision boundary (wide / narrow) has the better objective? 1 min w 2 w 2 + C ξ i (16) i I Introduction to Data Science Algorithms Boyd-Graber and Paul SVM 10 of 11

31 Which one is better? D C B A H E F G I 1 2 w 2 = = (16) ξ i = 4.25 (17) i C B A Decision Boundary Which decision boundary (wide / narrow) has the better objective? 1 min w 2 w 2 + C ξ i (18) i H E G I F Decision Boundary D Introduction to Data Science Algorithms Boyd-Graber and Paul SVM 10 of 11

32 Which one is better? D D C B A H E F G I C E H F B G Decision Boundary A Decision Boundary I 1 2 w 2 = (16) ξ i = 4.25 (17) i 1 2 w 2 = = 2 2 (18) ξ i = 8 (19) Which decision boundary (wide / narrow) has the better objective? i i 1 min w 2 w 2 + C ξ i (20) Introduction to Data Science Algorithms Boyd-Graber and Paul SVM 10 of 11

33 Which one is better? D D C B A H E F G I C E H F B G Decision Boundary A Decision Boundary I 1 2 w 2 = (16) ξ i = 4.25 (17) i 1 2 w 2 = 2 (18) ξ i = 8 (19) Which decision boundary (wide / narrow) has the better objective? 1 min w 2 w 2 + C ξ i (20) i i In this case it doesn t matter. Common C values: 1.0, 1 m Introduction to Data Science Algorithms Boyd-Graber and Paul SVM 10 of 11

34 Importance of C Need to do cross-validation to select C Don t trust default values Look at values with high ξ; are they bad data? Introduction to Data Science Algorithms Boyd-Graber and Paul SVM 11 of 11

35 Importance of C Need to do cross-validation to select C Don t trust default values Look at values with high ξ; are they bad data? Other courses: how to find w Introduction to Data Science Algorithms Boyd-Graber and Paul SVM 11 of 11

36 QFR I have always found that mercy bears richer fruits than strict justice. Tenure Research Service Grad Teaching/Advising Undergrad Teaching QFR are like my grades Numbers are insanely important Introduction to Data Science Algorithms Boyd-Graber and Paul SVM 12 of 11

Slack SVMs. Jordan Boyd-Graber. University of Colorado Boulder LECTURE 8. Jordan Boyd-Graber Boulder Slack SVMs 1 of 9

Slack SVMs. Jordan Boyd-Graber. University of Colorado Boulder LECTURE 8. Jordan Boyd-Graber Boulder Slack SVMs 1 of 9 Slack SVMs Jordan oyd-raber University of olorado oulder LTUR 8 Jordan oyd-raber oulder Slack SVMs 1 of 9 ontent Question Jordan oyd-raber oulder Slack SVMs 2 of 9 ontent Question Jordan oyd-raber oulder