Announcements - Homework

Size: px

Start display at page:

Download "Announcements - Homework"

Lorraine Page
5 years ago
Views:

1 Announcements - Homework Homework 1 is graded, please collect at end of lecture Homework 2 due today Homework 3 out soon (watch ) Ques 1 midterm review

2 HW1 score distribution 40 HW1 total score ~10 10~20 20~30 30~40 40~50 50~60 60~70 70~80 80~90 90~ ~110 2

3 Announcements - Midterm When: Wednesday, 10/20 Where: In Class What: You, your pencil, your textbook, your notes, course slides, your calculator, your good mood :) What NOT: No computers, iphones, or anything else that has an internet connection. Material: Everything from the beginning of the semester, until, and including SVMs and the Kernel trick 3

4 Recitation Tomorrow! Boosting, SVM (convex optimization), Midterm review! Strongly recommended!! Place: NSH 3305 (Note: change from last time) Time: 5-6 pm Rob

5 Support Vector Machines Aarti Singh Machine Learning / Oct 13, 2010

6 At Pittsburgh G-20 summit 6

7 Linear classifiers which line is better? 7

8 Pick the one with the largest margin! 8

9 Parameterizing the decision boundary w.x = j w (j) x (j) w.x + b > 0 w.x + b < 0 Example i (= 1,2,,n): Data: 9

10 Parameterizing the decision boundary w.x + b > 0 w.x + b < 0 10

11 Maximizing the margin w.x + b > 0 w.x + b < 0 Distance of closest examples from the line/hyperplane margin = g = 2a/ǁwǁ g g 11

12 Maximizing the margin w.x + b > 0 w.x + b < 0 Distance of closest examples from the line/hyperplane margin = g = 2a/ǁwǁ g g max g = 2a/ǁwǁ w,b s.t. (w.x j +b) y j a j Note: a is arbitrary (can normalize equations by a) 12

13 Support Vector Machines w.x + b > 0 w.x + b < 0 min w.w w,b s.t. (w.x j +b) y j 1 j g g Solve efficiently by quadratic programming (QP) Well-studied solution algorithms Linear hyperplane defined by support vectors 13

14 Support Vectors w.x + b > 0 w.x + b < 0 Linear hyperplane defined by support vectors Moving other points a little doesn t effect the decision boundary g g only need to store the support vectors to predict labels of new points How many support vectors in linearly separable case? m+1 14

15 What if data is not linearly separable? Use features of features of features of features. x 12, x 22, x 1 x 2,., exp(x 1 ) But run risk of overfitting! 15

16 What if data is still not linearly separable? Allow error in classification min w.w + C #mistakes w,b s.t. (w.x j +b) y j 1 j Maximize margin and minimize # mistakes on training data C - tradeoff parameter Not QP 0/1 loss (doesn t distinguish between near miss and bad mistake) 16

17 What if data is still not linearly Allow error in classification separable? min w.w + C Σξ j w,b s.t. (w.x j +b) y j 1-ξ j j ξ j 0 j j Soft margin approach ξ j - slack variables = (>1 if x j misclassifed) pay linear penalty if mistake C - tradeoff parameter (chosen by cross-validation) Still QP 17

18 Slack variables Hinge loss Complexity penalization min w.w + C Σξ j w,b s.t. (w.x j +b) y j 1-ξ j j ξ j 0 j j Hinge loss 0-1 loss

19 SVM vs. Logistic Regression SVM : Hinge loss Logistic Regression : Log loss ( -ve log conditional likelihood) Log loss Hinge loss 0-1 loss

20 What about multiple classes? 20

21 One against all Learn 3 classifiers separately: Class k vs. rest (w k, b k ) k=1,2,3 y = arg max w k.x + b k k But w k s may not be based on the same scale. Note: (aw).x + (ab) is also a solution 21

22 Learn 1 classifier: Multi-class SVM Simultaneously learn 3 sets of weights Margin - gap between correct class and nearest other class y = arg max w (k).x + b (k) 22

23 Learn 1 classifier: Multi-class SVM Simultaneously learn 3 sets of weights y = arg max w (k).x + b (k) Joint optimization: w k s have the same scale. 23

24 What you need to know Maximizing margin Derivation of SVM formulation Slack variables and hinge loss Relationship between SVMs and logistic regression 0/1 loss Hinge loss Log loss Tackling multiple class One against All Multiclass SVMs 24

25 SVMs reminder Regularization Hinge loss min w.w + C Σξ j w,b s.t. (w.x j +b) y j 1-ξ j ξ j 0 j j Soft margin approach 25

26 Today s Lecture Learn one of the most interesting and exciting recent advancements in machine learning The kernel trick High dimensional feature spaces at no extra cost! But first, a detour Constrained optimization! 26

27 Constrained Optimization 27

28 Lagrange Multiplier Dual Variables Moving the constraint to objective function Lagrangian: Solve: Constraint is tight when a > 0 28

29 Duality Primal problem: Dual problem: Weak duality For all feasible points Strong duality (holds under KKT conditions) 29

30 Lagrange Multiplier Dual Variables b -ve b +ve Solving: When a > 0, constraint is tight 30

Support Vector Machines

Two SVM tutorials linked in class website (please, read both): High-level presentation with applications (Hearst 1998) Detailed tutorial (Burges 1998) Support Vector Machines Machine Learning 10701/15781