Convex optimization COMS 4771

Size: px

Start display at page:

Download "Convex optimization COMS 4771"

Patricia Ray
5 years ago
Views:

1 Convex optimization COMS 4771

2 1. Recap: learning via optimization

3 Soft-margin SVMs Soft-margin SVM optimization problem defined by training data: w R d λ 2 w n n [ ] 1 y ix T i w. + 1 / 15

4 Soft-margin SVMs Soft-margin SVM optimization problem defined by training data: w R d λ 2 w n n [ ] 1 y ix T i w. + Compare with Empirical Risk Minimization for zero-one loss: 1 w R d n n 1{y ix T i w 0}. 1 / 15

5 Soft-margin SVMs Soft-margin SVM optimization problem defined by training data: w R d λ 2 w n n [ ] 1 y ix T i w. + Compare with Empirical Risk Minimization for zero-one loss: 1 w R d n n 1{y ix T i w 0}. In both cases, i-th term in summation is function of y ix T i w. 1 / 15

6 Zero-one loss vs. hinge loss 1 1 {yx T w 0} Zero-one loss: count 1 if f w(x) y. [ ] 1{yx T w 0} 1 yx T w. 1 + yx T w 2 / 15

7 Zero-one loss vs. hinge loss 1 [1 yx T w] + 1 {yx T w 0} 1 yx T w Hinge loss: an upper-bound on zero-one loss. [ ] 1{yx T w 0} 1 yx T w. + 2 / 15

8 Zero-one loss vs. hinge loss 1 [1 yx T w] + 1 {yx T w 0} 1 yx T w Hinge loss: an upper-bound on zero-one loss. [ ] 1{yx T w 0} 1 yx T w. Soft-margin SVM imizes an upper-bound on the training error rate, plus a term that encourages large margins. + 2 / 15

9 Learning via optimization Zero-one loss ERM Squared loss ERM Logistic regression MLE Soft-margin SVM 1 w R d n 1 w R d n 1 w R d n n 1{y ix T i w 0} n (x T i w y i) 2 n ln (1 + exp( y ix T i w)) λ w R d 2 w n n [ ] 1 y ix T i w + 3 / 15

10 Learning via optimization Zero-one loss ERM Squared loss ERM Logistic regression MLE Soft-margin SVM Generic learning objective: w R d 1 w R d n 1 w R d n 1 w R d n n 1{y ix T i w 0} n (x T i w y i) 2 n ln (1 + exp( y ix T i w)) λ w R d 2 w n r(w) }{{} regularization + 1 n n [ ] 1 y ix T i w + n φ i(w). } {{ } data fitting Regularization: encodes learning bias (e.g., prefer large margins). Data fitting: how poor is the fit to the training data. 3 / 15

11 Regularization Other kinds of regularization encode other inductive biases : e.g., r(w) = λ w 1 : encourages w to be small and sparse r(w) = λ d 1 wi+1 wi : encourage piecewise constant weights r(w) = λ w w old 2 2 : encourage closeness to old solution w old 4 / 15

12 Regularization Other kinds of regularization encode other inductive biases : e.g., r(w) = λ w 1 : encourages w to be small and sparse r(w) = λ d 1 wi+1 wi : encourage piecewise constant weights r(w) = λ w w old 2 2 : encourage closeness to old solution w old Can combine regularization with other data fitting objectives: e.g., λ Ridge regression w R d 2 w n (x T i w y i) 2 2n Lasso λ w w R d n (x T i w y i) 2 2n Sparse SVM λ w w R d n [ ] 1 y ix T i w n + Sparse logistic regression w R d λ w n n ( ln 1 + exp( y ix T i w) ) 4 / 15

13 2. Introduction to convexity

14 Convex sets A set A is convex if, for every pair of points {x, x } in A, the line segment between x and x is also contained in A. convex not convex convex convex 5 / 15

15 Convex sets A set A is convex if, for every pair of points {x, x } in A, the line segment between x and x is also contained in A. Examples: All of R d. Empty set. convex not convex convex convex Half-spaces: {x R d : b T x c}. Intersections of convex sets. Convex hulls: conv(s) := { k αixi : k N, x i S, α i 0, k αi = 1}. 5 / 15

16 Convex functions A function f : R d R is convex if for any x, x R d and α [0, 1], f ((1 α)x + αx ) (1 α) f(x) + α f(x ). x x x x f is not convex f is convex 6 / 15

17 Convex functions A function f : R d R is convex if for any x, x R d and α [0, 1], f ((1 α)x + αx ) (1 α) f(x) + α f(x ). x x x x f is not convex f is convex Examples: f(x) = c x for any c > 0 (on R) f(x) = x c for any c 1 (on R) f(x) = b T x for any b R d. f(x) = x for any norm. f(x) = x T Ax for symmetric positive semidefinite A. af + g for convex functions f and g, and a 0. max{f, g} for convex functions f and g. 6 / 15

18 Verifying convexity Is f(x) = x convex? 7 / 15

19 Verifying convexity Is f(x) = x convex? Pick any α [0, 1] and any x, x R d. 7 / 15

20 Verifying convexity Is f(x) = x convex? Pick any α [0, 1] and any x, x R d. f ((1 α)x + αx ) = (1 α)x + αx 7 / 15

21 Verifying convexity Is f(x) = x convex? Pick any α [0, 1] and any x, x R d. f ((1 α)x + αx ) = (1 α)x + αx (1 α)x + αx (triangle inequality) 7 / 15

22 Verifying convexity Is f(x) = x convex? Pick any α [0, 1] and any x, x R d. f ((1 α)x + αx ) = (1 α)x + αx (1 α)x + αx (triangle inequality) = (1 α) x + α x (homogeneity) 7 / 15

23 Verifying convexity Is f(x) = x convex? Pick any α [0, 1] and any x, x R d. f ((1 α)x + αx ) = (1 α)x + αx (1 α)x + αx (triangle inequality) = (1 α) x + α x (homogeneity) = (1 α)f(x) + αf(x ). Yes, f is convex. 7 / 15

24 Convexity of differentiable functions Differentiable functions If f : R d R is differentiable, then f is convex if and only if f(x) f(x 0) + f(x 0) T (x x 0) f(x) a(x) for all x, x 0 R d. x 0 a(x) = f(x 0) + f (x 0)(x x 0) 8 / 15

25 Convexity of differentiable functions Differentiable functions If f : R d R is differentiable, then f is convex if and only if f(x) f(x 0) + f(x 0) T (x x 0) f(x) a(x) for all x, x 0 R d. x 0 a(x) = f(x 0) + f (x 0)(x x 0) Twice-differentiable functions If f : R d R is twice-differentiable, then f is convex if and only if 2 f(x) 0 for all x R d (i.e., the Hessian, or matrix of second-derivatives, is positive semi-definite for all x). 8 / 15

26 Verifying convexity of differentiable functions Is f(x) = x 4 convex? 9 / 15

27 Verifying convexity of differentiable functions Is f(x) = x 4 convex? Use second-order condition for convexity. f(x) x = 4x3 2 f(x) x 2 = 12x / 15

28 Verifying convexity of differentiable functions Is f(x) = x 4 convex? Use second-order condition for convexity. Yes, f is convex. f(x) x = 4x3 2 f(x) x 2 = 12x / 15

29 Verifying convexity of differentiable functions Is f(x) = x 4 convex? Use second-order condition for convexity. Yes, f is convex. Is f(x) = e a,x convex? f(x) x = 4x3 2 f(x) x 2 = 12x / 15

30 Verifying convexity of differentiable functions Is f(x) = x 4 convex? Use second-order condition for convexity. Yes, f is convex. f(x) x = 4x3 2 f(x) x 2 = 12x 2 0. Is f(x) = e a,x convex? Use first-order condition for convexity. f(x) = e a,x { a, x } = e a,x a (chain rule). 9 / 15

31 Verifying convexity of differentiable functions Is f(x) = x 4 convex? Use second-order condition for convexity. Yes, f is convex. f(x) x = 4x3 2 f(x) x 2 = 12x 2 0. Is f(x) = e a,x convex? Use first-order condition for convexity. f(x) = e a,x { a, x } = e a,x a (chain rule). Difference between f and its affine approximation: f(x) ( f(x 0) + f(x 0), x x 0 ) = e a,x ( ) e a,x0 + e a,x0 a, x x 0 9 / 15

32 Verifying convexity of differentiable functions Is f(x) = x 4 convex? Use second-order condition for convexity. Yes, f is convex. f(x) x = 4x3 2 f(x) x 2 = 12x 2 0. Is f(x) = e a,x convex? Use first-order condition for convexity. f(x) = e a,x { a, x } = e a,x a (chain rule). Difference between f and its affine approximation: f(x) ( f(x 0) + f(x 0), x x ) ( ) 0 = e a,x e a,x0 + e a,x0 a, x x 0 ( = e a,x 0 e a,x x0 ( 1 + a, x x )) 0 9 / 15

33 Verifying convexity of differentiable functions Is f(x) = x 4 convex? Use second-order condition for convexity. Yes, f is convex. f(x) x = 4x3 2 f(x) x 2 = 12x 2 0. Is f(x) = e a,x convex? Use first-order condition for convexity. f(x) = e a,x { a, x } = e a,x a (chain rule). Difference between f and its affine approximation: f(x) ( f(x 0) + f(x 0), x x ) ( ) 0 = e a,x e a,x0 + e a,x0 a, x x 0 ( = e a,x 0 e a,x x0 ( 1 + a, x x )) 0 0 (because 1 + z e z for all z R). 9 / 15

34 Verifying convexity of differentiable functions Is f(x) = x 4 convex? Use second-order condition for convexity. Yes, f is convex. f(x) x = 4x3 2 f(x) x 2 = 12x 2 0. Is f(x) = e a,x convex? Use first-order condition for convexity. f(x) = e a,x { a, x } = e a,x a (chain rule). Difference between f and its affine approximation: f(x) ( f(x 0) + f(x 0), x x ) ( ) 0 = e a,x e a,x0 + e a,x0 a, x x 0 ( = e a,x 0 e a,x x0 ( 1 + a, x x )) 0 Yes, f is convex. 0 (because 1 + z e z for all z R). 9 / 15

35 3. Convex optimization problems

36 Optimization problems A typical optimization problem (in standard form) is written as x R d f 0(x) s.t. f i(x) 0 for all i = 1,..., n. 10 / 15

37 Optimization problems A typical optimization problem (in standard form) is written as x R d f 0(x) s.t. f i(x) 0 for all i = 1,..., n. f 0 : R d R is the objective function; 10 / 15

38 Optimization problems A typical optimization problem (in standard form) is written as x R d f 0(x) s.t. f i(x) 0 for all i = 1,..., n. f 0 : R d R is the objective function; f 1,..., f n : R d R are the constraint functions; 10 / 15

39 Optimization problems A typical optimization problem (in standard form) is written as x R d f 0(x) s.t. f i(x) 0 for all i = 1,..., n. f 0 : R d R is the objective function; f 1,..., f n : R d R are the constraint functions; inequalities f i(x) 0 are constraints; 10 / 15

40 Optimization problems A typical optimization problem (in standard form) is written as x R d f 0(x) s.t. f i(x) 0 for all i = 1,..., n. f 0 : R d R is the objective function; f 1,..., f n : R d R are the constraint functions; inequalities f i(x) 0 are constraints; { } A := x R d : f i(x) 0 for all i = 1, 2,..., n is the feasible region. 10 / 15

41 Optimization problems A typical optimization problem (in standard form) is written as x R d f 0(x) s.t. f i(x) 0 for all i = 1,..., n. f 0 : R d R is the objective function; f 1,..., f n : R d R are the constraint functions; inequalities f i(x) 0 are constraints; { } A := x R d : f i(x) 0 for all i = 1, 2,..., n is the feasible region. Goal: Find x A so that f 0(x) is as small as possible. 10 / 15

42 Optimization problems A typical optimization problem (in standard form) is written as x R d f 0(x) s.t. f i(x) 0 for all i = 1,..., n. f 0 : R d R is the objective function; f 1,..., f n : R d R are the constraint functions; inequalities f i(x) 0 are constraints; { } A := x R d : f i(x) 0 for all i = 1, 2,..., n is the feasible region. Goal: Find x A so that f 0(x) is as small as possible. (Optimal) value of the optimization problem is the smallest value f 0(x) achieved by a feasible point x A. 10 / 15

43 Optimization problems A typical optimization problem (in standard form) is written as x R d f 0(x) s.t. f i(x) 0 for all i = 1,..., n. f 0 : R d R is the objective function; f 1,..., f n : R d R are the constraint functions; inequalities f i(x) 0 are constraints; { } A := x R d : f i(x) 0 for all i = 1, 2,..., n is the feasible region. Goal: Find x A so that f 0(x) is as small as possible. (Optimal) value of the optimization problem is the smallest value f 0(x) achieved by a feasible point x A. Point x A achieving the optimal value is a (global) imizer. 10 / 15

44 Convex optimization problems Standard form of a convex optimization problem: x R d f 0(x) s.t. f i(x) 0 for all i = 1,..., n where f 0, f 1,..., f n : R d R are convex functions. 11 / 15

45 Convex optimization problems Standard form of a convex optimization problem: x R d f 0(x) s.t. f i(x) 0 for all i = 1,..., n where f 0, f 1,..., f n : R d R are convex functions. Fact: the feasible set { } A := x R d : f i(x) 0 for all i = 1, 2,..., n is a convex set. 11 / 15

46 Example: Soft-margin SVM w R d,ξ 1,...,ξ n R λ 2 w n n s.t. y ix T i w 1 ξ i for all i = 1,..., n, ξ i ξ i 0 for all i = 1,..., n. 12 / 15

47 Example: Soft-margin SVM w R d,ξ 1,...,ξ n R Bringing to standard form: w R d,ξ 1,...,ξ n R λ 2 w n n s.t. y ix T i w 1 ξ i for all i = 1,..., n, λ 2 w n ξ i ξ i 0 for all i = 1,..., n. n s.t. 1 ξ i y ix T i w 0 for all i = 1,..., n ξ i 0 ξ i for all i = 1,..., n 12 / 15

48 Example: Soft-margin SVM w R d,ξ 1,...,ξ n R Bringing to standard form: w R d,ξ 1,...,ξ n R λ 2 w n n s.t. y ix T i w 1 ξ i for all i = 1,..., n, λ 2 w n ξ i ξ i 0 for all i = 1,..., n. n s.t. 1 ξ i y ix T i w 0 for all i = 1,..., n ξ i 0 ξ i for all i = 1,..., n (sum of two convex functions, also convex) 12 / 15

49 Example: Soft-margin SVM w R d,ξ 1,...,ξ n R Bringing to standard form: w R d,ξ 1,...,ξ n R λ 2 w n n s.t. y ix T i w 1 ξ i for all i = 1,..., n, λ 2 w n ξ i ξ i 0 for all i = 1,..., n. n ξ i (sum of two convex functions, also convex) s.t. 1 ξ i y ix T i w 0 for all i = 1,..., n (linear constraints) ξ i 0 for all i = 1,..., n 12 / 15

50 Example: Soft-margin SVM w R d,ξ 1,...,ξ n R Bringing to standard form: w R d,ξ 1,...,ξ n R λ 2 w n n s.t. y ix T i w 1 ξ i for all i = 1,..., n, λ 2 w n ξ i ξ i 0 for all i = 1,..., n. n ξ i (sum of two convex functions, also convex) s.t. 1 ξ i y ix T i w 0 for all i = 1,..., n (linear constraints) ξ i 0 for all i = 1,..., n (linear constraints) 12 / 15

51 Example: Soft-margin SVM w R d,ξ 1,...,ξ n R Bringing to standard form: w R d,ξ 1,...,ξ n R λ 2 w n n s.t. y ix T i w 1 ξ i for all i = 1,..., n, λ 2 w n ξ i ξ i 0 for all i = 1,..., n. n ξ i (sum of two convex functions, also convex) s.t. 1 ξ i y ix T i w 0 for all i = 1,..., n (linear constraints) ξ i 0 for all i = 1,..., n (linear constraints) Which of Zero-one loss ERM, Squared loss ERM, Logistic regression MLE, Ridge regression, Lasso, Sparse SVM are convex? 12 / 15

52 Local imizers Consider an optimization problem (not necessarily convex): f 0(x) x R d s.t. x A. 13 / 15

53 Local imizers Consider an optimization problem (not necessarily convex): f 0(x) x R d s.t. x A. We say x A is a local imizer if there is an open ball { } U := x R d : x x 2 < r of positive radius r > 0 such that x is a global imizer for f 0(x) x R d s.t. x A U. 13 / 15

54 Local imizers Consider an optimization problem (not necessarily convex): f 0(x) x R d s.t. x A. We say x A is a local imizer if there is an open ball { } U := x R d : x x 2 < r of positive radius r > 0 such that x is a global imizer for f 0(x) x R d s.t. x A U. Nothing looks better than x in the immediate vicinity of x. locally optimal 13 / 15

55 Local imizers of convex problems If the optimization problem is convex, and x A is a local imizer, then it is also a global imizer. local global 14 / 15

56 Key takeaways 1. Formulation of learning methods as optimization problems. 2. Convex sets, convex functions, ways to check if a function is convex. 3. Standard form of convex optimization problems, concept of local and global imizers. 15 / 15

Lecture 2: Convex Sets and Functions

Lecture 2: Convex Sets and Functions Hyang-Won Lee Dept. of Internet & Multimedia Eng. Konkuk University Lecture 2 Network Optimization, Fall 2015 1 / 22 Optimization Problems Optimization problems are