10701 Recitation 5 Duality and SVM. Ahmed Hefny

Size: px

Start display at page:

Download "10701 Recitation 5 Duality and SVM. Ahmed Hefny"

Gabriella Welch
6 years ago
Views:

1 10701 Recitation 5 Duality and SVM Ahmed Hefny

2 Outline Langrangian and Duality The Lagrangian Duality Eamples Support Vector Machines Primal Formulation Dual Formulation Soft Margin and Hinge Loss

3 Lagrangian Consider the problem min f() s.t. g i = 0 Add a Lagrange multiplier for each constraint L, u = f + i u i g i ()

4 Lagrangian Lagrangian L, u = f + i u i g i () Setting gradient to 0 gives g i = 0 [Feasible point] f + i u i g i = 0 [Cannot decrease f ecept by violating constraints]

5 Lagrangian Consider the problem min f() s.t. g i = 0 h j 0 Add a Lagrange multiplier for each constraint L, u, λ = f + i u i g i () + j λ j h j ()

6 Duality

7 Duality Primal problem min f() s.t. g i = 0 Equivalent to h j 0 min ma λ 0,u f + i u i g i () + j λ j h j ()

8 Duality Primal problem min f() s.t. g i = 0 Equivalent to h j 0 f() is feasible min o. w.

9 Duality Dual Problem ma min λ 0,u f + i u i g i () + j λ j h j () Dual function: Lagrangian Dual Function L(λ, u) Concave, regardless of the conveity of the primal Lower bound on primal

10 Duality Primal Problem min L(, λ) ma λ 0 λ

11 Duality Primal Problem min L(, λ) ma λ 0 For each row (choice of ), pick the largest element then select the minimum. λ

12 Duality Dual Problem ma min L(, λ) λ 0 For each column (choice of λ), pick the smallest element then select the maimum. λ

13 Duality Claim: min ma λ 0 L(, λ) ma min L(, λ) λ 0, λ λ

14 Duality Claim: min ma λ 0 L(, λ) ma min L(, λ) λ 0 For any λ 0 min L(, λ) L, λ L(, λ ) The difference between primal minimum And dual maimum is called duality gap duality gap = 0 Strong Duality, λ λ

15 Duality When does min ma λ 0 L(, λ) = ma min L(, λ) λ 0, λ λ

16 Duality When does min ma λ 0 L(, λ) = ma min L(, λ) λ 0, λ is a saddle point L, λ L, λ L(, λ ), λ λ

17 Duality When does min ma λ 0 L(, λ) = ma min L(, λ) λ 0, λ is a saddle point L, λ L, λ L(, λ ) Necessity By definition of dual Sufficiency L λ = min L(, λ) L, λ L λ = L, λ, λ λ

18 Duality When does min ma λ 0 L(, λ) = ma min L(, λ) λ 0, λ is a saddle point L, λ L, λ L(, λ ) Necessity By definition of dual Sufficiency L λ = min L(, λ) L, λ L λ = L, λ, λ λ The dual at λ is the upper bound

19 Duality If strong duality holds, KKT conditions apply to optimal point Stationary Point L, u, λ = 0 Primal Feasibility Dual Feasibility (λ 0) Complementary Slackness (λ i h i = 0) KKT conditions are Sufficient Necessary under strong duality

20 Eample: LP Primal min c T s.t. A b

21 Eample: LP Primal Lagrangian min c T s.t. A b L, λ = c T λ T A b

22 Eample: LP Dual Function L λ = min c T λ T A b

23 Eample: LP Dual Function L λ = min c T λ T A b Set gradient w.r.t to 0 c A T λ = 0

24 Eample: LP Dual Function L λ = min c T λ T A b Set gradient w.r.t to 0 c A T λ = 0 Dual Problem ma λ 0 λt b s.t. c A T λ = 0 Why keep this as a constraint?

25 Eample: LASSO We will use duality to transform LASSO into a QP

26 Eample: LASSO Primal min 1 2 y Xw 2 + γ w 1 What is the dual function in this case?

27 Eample: LASSO Reformulated Primal Dual L λ = min z,w min 1 2 y z 2 + γ w 1 s.t. z = Xw 1 2 y z 2 + γ w 1 + λ T (z Xw)

28 Eample: LASSO Dual L λ = min z,w 1 2 y z 2 + γ w 1 + λ T (z Xw) Setting gradient to zero gives z = y λ X T λ γ

29 Eample: LASSO Dual Problem ma 1 2 λ 2 + λ T y s.t. X T λ γ

30 Support Vector Machines docs.opencv.org

31 Support Vector Machines Find the maimum margin hyper-plane Distance from a point w, i + b = 0 is given by d i = ( w, i to the hyper-plane + b)/ w Margin = min i y i d i = 1 w min i w, i + b y i Ma Margin: ma w,b 1 w min i w, i + b y i

32 Support Vector Machines Ma Margin ma w,b 1 w min i w, i + b y i Unpleasant (ma min?) No Unique Solution

33 Support Vector Machines Ma Margin 1 w min i w, i + b y i s.t.??? ma w,b

34 Support Vector Machines Ma Margin 1 w min i w, i + b y i s.t. min i w, i + b y i = 1 ma w,b

35 Support Vector Machines Ma Margin min w,b 1 2 w 2 s.t. min i w, i + b y i = 1

36 Support Vector Machines Ma Margin (Canonical Representation) 1 min w,b 2 w 2 s.t. w, i + b y i 1, i QP, much better than ma w,b 1 w min i w, i + b y i

37 SVM Dual Problem Recall that the Lagrangian is formed by adding a Lagrange multiplier for each constraint. L w, b, α = 1 2 w 2 i α i [ w, i + b y i 1]

38 SVM Dual Problem L w, b, α = 1 2 w 2 Fi α and minimize w.r.t w, b: w i α i y i i = 0 i α i y i = 0 i α i w, i + b y i 1

39 SVM Dual Problem L w, b, α = 1 2 w 2 α i w, i + b y i 1 Fi α and minimize w.r.t w, b: w i α i y i i = 0 i α i y i = 0 i Plug-in Constraint (why?)

40 SVM Dual Problem Dual Problem ma 1 2 i j α i α j y i y j i, j + α i s.t. i α i y i = 0 α i 0 i Another QP. So what?

41 SVM Dual Problem Only Inner products Kernel Trick Complementary Slackness Support Vectors KKT conditions lead to Efficient optimization algorithms (compared to general QP solver)

42 SVM Dual Problem Classification of a test point f = w, + b = i α i y i i, + b To get b use the fact that y i f( i ) = 1 for any support vector. For numerical stability, average over all support vectors.

43 Soft Margin SVM Hard Margin SVM min w,b, where i E 1 w, i + b y i w 2 E = 0 0 < 0

44 Soft Margin SVM Hard Margin SVM min w,b, where i E 1 w, i + b y i w 2 loss regularization E = 0 0 < 0 loss y i f( i )

45 Soft Margin SVM Rela it a little bit min w,b, where i E C 1 w, i + b y i w 2 E C = C 0 0 < 0

46 Soft Margin SVM Rela it a little bit min w,b, where i E C 1 w, i + b y i w 2 E C = C 0 0 < 0 loss y i f( i )

47 Soft Margin SVM Rela it a little bit min w,b C i 1 w, i + b y i w 2 loss y i f( i )

48 Soft Margin SVM Equivalent Formulation min C w,b,ζ i ζ i + 1 w 2 2 s.t. ζ i 0 w, i + b y i 1 ζ i

49 Conclusions Duality allows for establishing a lower bound on minimization problem. Key idea min ma upper bounds ma min Strong Duality Necessity of KKT Conditions Duality on SVMs Kernel Trick Support Vectors Soft Margin SVM = Hinge Loss

50 Resources Bishop, Pattern Recognition and Machine Learning, Chp 7 Gordon & Tibshirani, Optimization (Fall 2012) Lecture Slides: F12/schedule.html Fiterau, Kernels and SVM 701/slides/6_Recitation_Kernels.pdf

Support Vector Machines

Support Vector Machines Support vector machines (SVMs) are one of the central concepts in all of machine learning. They are simply a combination of two ideas: linear classification via maximum (or optimal