Optimization for Machine Learning

Save this PDF as:
 WORD  PNG  TXT  JPG

Size: px
Start display at page:

Download "Optimization for Machine Learning"

Transcription

1 Optimization for Machine Learning (Problems; Algorithms - A) SUVRIT SRA Massachusetts Institute of Technology PKU Summer School on Data Science (July 2017)

2 Course materials Some references: Introductory lectures on convex optimization Nesterov Convex optimization Boyd & Vandenberghe Nonlinear programming Bertsekas Convex Analysis Rockafellar Fundamentals of convex analysis Urruty, Lemaréchal Lectures on modern convex optimization Nemirovski Optimization for Machine Learning Sra, Nowozin, Wright Theory of Convex Optimization for Machine Learning Bubeck NIPS 2016 Optimization Tutorial Bach, Sra Some related courses: EE227A, Spring 2013, (Sra, UC Berkeley) , Spring 2014 (Sra, CMU) EE364a,b (Boyd, Stanford) EE236b,c (Vandenberghe, UCLA) Venues: NIPS, ICML, UAI, AISTATS, SIOPT, Math. Prog. Suvrit Sra Optimization for Machine Learning 2 / 36

3 Lecture Plan Introduction (3 lectures) Problems and algorithms (5 lectures) Non-convex optimization, perspectives (2 lectures) Suvrit Sra Optimization for Machine Learning 3 / 36

4 Constrained problems Suvrit Sra Optimization for Machine Learning 4 / 36

5 Optimality constrained For every x, y dom f, we have f (y) f (x) + f (x), y x. Suvrit Sra Optimization for Machine Learning 5 / 36

6 Optimality constrained For every x, y dom f, we have f (y) f (x) + f (x), y x. Thus, x is optimal if and only if f (x ), y x 0, for all y X. Suvrit Sra Optimization for Machine Learning 5 / 36

7 Optimality constrained For every x, y dom f, we have f (y) f (x) + f (x), y x. Thus, x is optimal if and only if f (x ), y x 0, for all y X. If X = R n, this reduces to f (x ) = 0 f(x) X x f(x ) x If f (x ) 0, it defines supporting hyperplane to X at x Suvrit Sra Optimization for Machine Learning 5 / 36

8 Optimality conditions constrained Proof: Suppose y X such that f (x ), y x < 0 Using mean-value theorem of calculus, ξ [0, 1] s.t. f (x + t(y x )) = f (x ) + f (x + ξt(y x )), t(y x ) (we applied MVT to g(t) := f (x + t(y x ))) For sufficiently small t, since f continuous, by assump on y, f (x + ξt(y x )), y x < 0 This in turn implies that f (x + t(y x )) < f (x ) Since X is convex, x + t(y x ) X is also feasible Contradiction to local optimality of x Suvrit Sra Optimization for Machine Learning 6 / 36

9 Example: projection operator P X (z) := argmin x z 2 x X (Assume X is closed and convex, then projection is unique) Let X be nonempty, closed and convex. Optimality condition: x = P X (y) iff x z, y x 0 for all y X Exercise: Prove that projection is nonexpansive, i.e., P X (x) P X (y) 2 x y 2 for all x, y R n. Suvrit Sra Optimization for Machine Learning 7 / 36

10 Feasible descent min f (x) s.t. x X f (x ), x x 0, x X. f(x) X x f(x ) x Suvrit Sra Optimization for Machine Learning 8 / 36

11 Feasible descent x k+1 = x k + α k d k Suvrit Sra Optimization for Machine Learning 9 / 36

12 Feasible descent x k+1 = x k + α k d k d k feasible direction, i.e., x k + α k d k X Suvrit Sra Optimization for Machine Learning 9 / 36

13 Feasible descent x k+1 = x k + α k d k d k feasible direction, i.e., x k + α k d k X d k must also be descent direction, i.e., f (x k ), d k < 0 Stepsize α k chosen to ensure feasibility and descent. Suvrit Sra Optimization for Machine Learning 9 / 36

14 Feasible descent x k+1 = x k + α k d k d k feasible direction, i.e., x k + α k d k X d k must also be descent direction, i.e., f (x k ), d k < 0 Stepsize α k chosen to ensure feasibility and descent. Since X is convex, all feasible directions are of the form d k = γ(z x k ), γ > 0, where z X is any feasible vector. Suvrit Sra Optimization for Machine Learning 9 / 36

15 Feasible descent x k+1 = x k + α k d k d k feasible direction, i.e., x k + α k d k X d k must also be descent direction, i.e., f (x k ), d k < 0 Stepsize α k chosen to ensure feasibility and descent. Since X is convex, all feasible directions are of the form d k = γ(z x k ), γ > 0, where z X is any feasible vector. x k+1 = x k + α k (z k x k ), α k (0, 1] Suvrit Sra Optimization for Machine Learning 9 / 36

16 Cone of feasible directions x Feasible directions at x X d Suvrit Sra Optimization for Machine Learning 10 / 36

17 Frank-Wolfe / conditional gradient method Optimality: f (x k ), z k x k 0 for all z k X Suvrit Sra Optimization for Machine Learning 11 / 36

18 Frank-Wolfe / conditional gradient method Optimality: f (x k ), z k x k 0 for all z k X Aim: If not optimal, then generate feasible direction d k = z k x k that obeys descent condition f (x k ), d k < 0. Suvrit Sra Optimization for Machine Learning 11 / 36

19 Frank-Wolfe / conditional gradient method Optimality: f (x k ), z k x k 0 for all z k X Aim: If not optimal, then generate feasible direction d k = z k x k that obeys descent condition f (x k ), d k < 0. Frank-Wolfe (Conditional gradient) method Let z k argmin x X f (x k ), x x k Use different methods to select α k x k+1 = x k + α k (z k x k ) Suvrit Sra Optimization for Machine Learning 11 / 36

20 Frank-Wolfe / conditional gradient method Optimality: f (x k ), z k x k 0 for all z k X Aim: If not optimal, then generate feasible direction d k = z k x k that obeys descent condition f (x k ), d k < 0. Frank-Wolfe (Conditional gradient) method Let z k argmin x X f (x k ), x x k Use different methods to select α k x k+1 = x k + α k (z k x k ) Due to M. Frank and P. Wolfe (1956) Practical when solving linear problem over X easy Very popular in machine learning over recent years Refinements, several variants (including nonconvex) Suvrit Sra Optimization for Machine Learning 11 / 36

21 Frank-Wolfe: Convergence Assum: There is a C 0 s.t. for all x, z X and α (0, 1): f ( (1 α)x + αz) f (x) + α f (x), z x Cα2. Suvrit Sra Optimization for Machine Learning 12 / 36

22 Frank-Wolfe: Convergence Assum: There is a C 0 s.t. for all x, z X and α (0, 1): f ( (1 α)x + αz) f (x) + α f (x), z x Cα2. Let α k = 2 k+2. Recall xk+1 = (1 α k )x k + α k z k ; thus, Suvrit Sra Optimization for Machine Learning 12 / 36

23 Frank-Wolfe: Convergence Assum: There is a C 0 s.t. for all x, z X and α (0, 1): f ( (1 α)x + αz) f (x) + α f (x), z x Cα2. Let α k = 2 k+2. Recall xk+1 = (1 α k )x k + α k z k ; thus, f (x k+1 ) f (x ) = f ((1 α k )x k + α k z k ) f (x ) Suvrit Sra Optimization for Machine Learning 12 / 36

24 Frank-Wolfe: Convergence Assum: There is a C 0 s.t. for all x, z X and α (0, 1): f ( (1 α)x + αz) f (x) + α f (x), z x Cα2. Let α k = 2 k+2. Recall xk+1 = (1 α k )x k + α k z k ; thus, f (x k+1 ) f (x ) = f ((1 α k )x k + α k z k ) f (x ) f (x k ) f (x ) + α k f (x k ), z k x k α2 k C Suvrit Sra Optimization for Machine Learning 12 / 36

25 Frank-Wolfe: Convergence Assum: There is a C 0 s.t. for all x, z X and α (0, 1): f ( (1 α)x + αz) f (x) + α f (x), z x Cα2. Let α k = 2 k+2. Recall xk+1 = (1 α k )x k + α k z k ; thus, f (x k+1 ) f (x ) = f ((1 α k )x k + α k z k ) f (x ) f (x k ) f (x ) + α k f (x k ), z k x k α2 k C f (x k ) f (x ) + α k f (x k ), x x k α2 k C Suvrit Sra Optimization for Machine Learning 12 / 36

26 Frank-Wolfe: Convergence Assum: There is a C 0 s.t. for all x, z X and α (0, 1): f ( (1 α)x + αz) f (x) + α f (x), z x Cα2. Let α k = 2 k+2. Recall xk+1 = (1 α k )x k + α k z k ; thus, f (x k+1 ) f (x ) = f ((1 α k )x k + α k z k ) f (x ) f (x k ) f (x ) + α k f (x k ), z k x k α2 k C f (x k ) f (x ) + α k f (x k ), x x k α2 k C f (x k ) f (x ) α k (f (x k ) f (x )) α2 k C Suvrit Sra Optimization for Machine Learning 12 / 36

27 Frank-Wolfe: Convergence Assum: There is a C 0 s.t. for all x, z X and α (0, 1): f ( (1 α)x + αz) f (x) + α f (x), z x Cα2. Let α k = 2 k+2. Recall xk+1 = (1 α k )x k + α k z k ; thus, f (x k+1 ) f (x ) = f ((1 α k )x k + α k z k ) f (x ) f (x k ) f (x ) + α k f (x k ), z k x k α2 k C f (x k ) f (x ) + α k f (x k ), x x k α2 k C f (x k ) f (x ) α k (f (x k ) f (x )) α2 k C = (1 α k )(f (x k ) f (x )) α2 k C. Suvrit Sra Optimization for Machine Learning 12 / 36

28 Frank-Wolfe: Convergence Assum: There is a C 0 s.t. for all x, z X and α (0, 1): f ( (1 α)x + αz) f (x) + α f (x), z x Cα2. Let α k = 2 k+2. Recall xk+1 = (1 α k )x k + α k z k ; thus, f (x k+1 ) f (x ) = f ((1 α k )x k + α k z k ) f (x ) f (x k ) f (x ) + α k f (x k ), z k x k α2 k C f (x k ) f (x ) + α k f (x k ), x x k α2 k C f (x k ) f (x ) α k (f (x k ) f (x )) α2 k C = (1 α k )(f (x k ) f (x )) α2 k C. A simple induction (Verify!) then shows that f (x k ) f (x ) 2C k + 2, k 0. Suvrit Sra Optimization for Machine Learning 12 / 36

29 Example: Linear Oracle Suppose X = { x p 1 }, for p > 1. Write Linear Oracle (LO) as maximization problem: max z g, z s.t. z p 1. What is the answer? Suvrit Sra Optimization for Machine Learning 13 / 36

30 Example: Linear Oracle Suppose X = { x p 1 }, for p > 1. Write Linear Oracle (LO) as maximization problem: max z g, z s.t. z p 1. What is the answer? Hint: Recall, g, z z p g q. Pick z to obtain equality. Suvrit Sra Optimization for Machine Learning 13 / 36

31 Example: Linear Oracle Suppose X = { x p 1 }, for p > 1. Write Linear Oracle (LO) as maximization problem: max z g, z s.t. z p 1. What is the answer? Hint: Recall, g, z z p g q. Pick z to obtain equality. Suvrit Sra Optimization for Machine Learning 13 / 36

32 Example: Linear Oracle max Z Trace norm LO G, Z σ i(z) 1. i Suvrit Sra Optimization for Machine Learning 14 / 36

33 Example: Linear Oracle max Z Trace norm LO G, Z σ i(z) 1. i max { G, Z Z 1} is just the dual-norm to the trace norm; see Lectures 1-3 for more on dual norms. Suvrit Sra Optimization for Machine Learning 14 / 36

34 Example: Linear Oracle max Z Trace norm LO G, Z σ i(z) 1. i max { G, Z Z 1} is just the dual-norm to the trace norm; see Lectures 1-3 for more on dual norms. can be shown that G 2 is the dual norm here. Suvrit Sra Optimization for Machine Learning 14 / 36

35 Example: Linear Oracle max Z Trace norm LO G, Z σ i(z) 1. i max { G, Z Z 1} is just the dual-norm to the trace norm; see Lectures 1-3 for more on dual norms. can be shown that G 2 is the dual norm here. Optimal Z satisfies G, Z = G 2 Z = G 2 ; use Lanczos (or using power method) to compute top singular vectors. (for more examples: Jaggi, Revisiting Frank-Wolfe:...) Suvrit Sra Optimization for Machine Learning 14 / 36

36 Extensions How about FW for nonconvex objective functions? What about FW methods that can converge faster than O(1/k)? Suvrit Sra Optimization for Machine Learning 15 / 36

37 Extensions How about FW for nonconvex objective functions? What about FW methods that can converge faster than O(1/k)? Nonconvex-FW possible. It works (i.e., satisfies first-order optimality conditions to ɛ-accuracy in O(1/ɛ) iterations (Lacoste-Julien 2016; Reddi et al. 2016). Suvrit Sra Optimization for Machine Learning 15 / 36

38 Extensions How about FW for nonconvex objective functions? What about FW methods that can converge faster than O(1/k)? Nonconvex-FW possible. It works (i.e., satisfies first-order optimality conditions to ɛ-accuracy in O(1/ɛ) iterations (Lacoste-Julien 2016; Reddi et al. 2016). Linear convergence under quite strong assumptions on both f and X ; alternatively, use a more complicated method: FW with Away Steps (Guelat-Marcotte 1986); more recently (Jaggi, Lacoste-Julien 2016) Suvrit Sra Optimization for Machine Learning 15 / 36

39 Quadratic oracle: projection methods FW can be quite slow If X not compact, LO does not make sense A possible alternative (with other weaknesses though!) Suvrit Sra Optimization for Machine Learning 16 / 36

40 Quadratic oracle: projection methods FW can be quite slow If X not compact, LO does not make sense A possible alternative (with other weaknesses though!) If constraint set X is simple, i.e., we can easily solve projections min 1 2 x y 2 s.t. x X. Suvrit Sra Optimization for Machine Learning 16 / 36

41 Quadratic oracle: projection methods FW can be quite slow If X not compact, LO does not make sense A possible alternative (with other weaknesses though!) If constraint set X is simple, i.e., we can easily solve projections min 1 2 x y 2 s.t. x X. Projected Gradient x k+1 = P X ( x k α k f (x k ) ), k = 0, 1,... where P X denotes above orthogonal projection. Suvrit Sra Optimization for Machine Learning 16 / 36

42 Quadratic oracle: projection methods FW can be quite slow If X not compact, LO does not make sense A possible alternative (with other weaknesses though!) If constraint set X is simple, i.e., we can easily solve projections min 1 2 x y 2 s.t. x X. Projected Gradient x k+1 = P X ( x k α k f (x k ) ), k = 0, 1,... where P X denotes above orthogonal projection. PG can be much faster than O(1/k) of FW (e.g., O(e k ) for strongly convex); but LO can be sometimes much faster than projections. Suvrit Sra Optimization for Machine Learning 16 / 36

43 Projected Gradient convergence Depends on the following crucial properties of P: Nonexpansivity: Px Py 2 x y 2 Firm nonxpansivity: Px Py 2 2 Px Py, x y Suvrit Sra Optimization for Machine Learning 17 / 36

44 Projected Gradient convergence Depends on the following crucial properties of P: Nonexpansivity: Px Py 2 x y 2 Firm nonxpansivity: Px Py 2 2 Px Py, x y Using projections, essentially convergence analysis with α k = 1/L for the unconstrained case works. Exercise: Let f (x) = 1 2 Ax b 2 2. Write a Matlab/Python script to minimize this function over the convex set X := { 1 x i 1} using projected gradient as well as Frank-Wolfe. Compare the two. Suvrit Sra Optimization for Machine Learning 17 / 36

45 Duality Suvrit Sra Optimization for Machine Learning 18 / 36

46 Primal problem Let f i : R n R (0 i m). Generic nonlinear program min f 0 (x) s.t. f i (x) 0, 1 i m, x {dom f 0 dom f 1 dom f m }. (P) Def. Domain: The set D := {dom f 0 dom f 1 dom f m } We call (P) the primal problem The variable x is the primal variable We will attach to (P) a dual problem In our initial derivation: no restriction to convexity. Suvrit Sra Optimization for Machine Learning 19 / 36

47 Lagrangian To the primal problem, associate Lagrangian L : R n R m R, L(x, λ) := f 0 (x) + m λ if i (x). i=1 Variables λ R m called Lagrange multipliers Suvrit Sra Optimization for Machine Learning 20 / 36

48 Lagrangian To the primal problem, associate Lagrangian L : R n R m R, L(x, λ) := f 0 (x) + m λ if i (x). i=1 Variables λ R m called Lagrange multipliers If x is feasible, λ 0, then we get the lower-bound f 0 (x) L(x, λ) x X, λ R m +. Suvrit Sra Optimization for Machine Learning 20 / 36

49 Lagrangian To the primal problem, associate Lagrangian L : R n R m R, L(x, λ) := f 0 (x) + m λ if i (x). i=1 Variables λ R m called Lagrange multipliers If x is feasible, λ 0, then we get the lower-bound f 0 (x) L(x, λ) x X, λ R m +. Lagrangian helps write problem in unconstrained form Suvrit Sra Optimization for Machine Learning 20 / 36

50 Lagrangian Claim: Since, f 0 (x) L(x, λ) x X, λ R m +, primal optimal p = inf x X sup λ 0 L(x, λ). Suvrit Sra Optimization for Machine Learning 21 / 36

51 Lagrangian Claim: Since, f 0 (x) L(x, λ) x X, λ R m +, primal optimal p = inf x X sup λ 0 L(x, λ). Proof: If x is not feasible, then some f i (x) > 0 Suvrit Sra Optimization for Machine Learning 21 / 36

52 Lagrangian Claim: Since, f 0 (x) L(x, λ) x X, λ R m +, primal optimal p = inf x X sup λ 0 L(x, λ). Proof: If x is not feasible, then some f i (x) > 0 In this case, inner sup is +, so claim true by definition Suvrit Sra Optimization for Machine Learning 21 / 36

53 Lagrangian Claim: Since, f 0 (x) L(x, λ) x X, λ R m +, primal optimal p = inf x X sup λ 0 L(x, λ). Proof: If x is not feasible, then some f i (x) > 0 In this case, inner sup is +, so claim true by definition If x is feasible, each f i (x) 0, so sup λ i λ if i (x) = 0 Suvrit Sra Optimization for Machine Learning 21 / 36

54 Lagrange dual function Def. We define the Lagrangian dual as g(λ) := inf x L(x, λ). Suvrit Sra Optimization for Machine Learning 22 / 36

55 Lagrange dual function Def. We define the Lagrangian dual as g(λ) := inf x L(x, λ). Observations: g is pointwise inf of affine functions of λ Thus, g is concave; it may take value Suvrit Sra Optimization for Machine Learning 22 / 36

56 Lagrange dual function Def. We define the Lagrangian dual as g(λ) := inf x L(x, λ). Observations: g is pointwise inf of affine functions of λ Thus, g is concave; it may take value Recall: f 0 (x) L(x, λ) x X, λ 0; thus x X, f 0 (x) inf x L(x, λ) =: g(λ) Now minimize over x on lhs, to obtain λ R m + p g(λ). Suvrit Sra Optimization for Machine Learning 22 / 36

57 Lagrange dual problem sup g(λ) s.t. λ 0. λ Suvrit Sra Optimization for Machine Learning 23 / 36

58 Lagrange dual problem sup g(λ) s.t. λ 0. λ dual feasible: if λ 0 and g(λ) > dual optimal: λ if sup is achieved Lagrange dual is always concave, regardless of original Suvrit Sra Optimization for Machine Learning 23 / 36

59 Weak duality Def. Denote dual optimal value by d, i.e., d := sup λ 0 g(λ). Suvrit Sra Optimization for Machine Learning 24 / 36

60 Weak duality Def. Denote dual optimal value by d, i.e., d := sup λ 0 g(λ). Theorem. (Weak-duality): For problem (P), we have p d. Suvrit Sra Optimization for Machine Learning 24 / 36

61 Weak duality Def. Denote dual optimal value by d, i.e., d := sup λ 0 g(λ). Theorem. (Weak-duality): For problem (P), we have p d. Proof: We showed that for all λ R m +, p g(λ). Thus, it follows that p sup g(λ) = d. Suvrit Sra Optimization for Machine Learning 24 / 36

62 Duality gap p d 0 Suvrit Sra Optimization for Machine Learning 25 / 36

63 Duality gap p d 0 Strong duality if duality gap is zero: p = d Notice: both p and d may be + Suvrit Sra Optimization for Machine Learning 25 / 36

64 Duality gap p d 0 Strong duality if duality gap is zero: p = d Notice: both p and d may be + Several sufficient conditions known, especially for convex optimization. Easy necessary and sufficient conditions: unknown Suvrit Sra Optimization for Machine Learning 25 / 36

65 Example: Slater s sufficient conditions min f 0 (x) s.t. f i (x) 0, 1 i m, Ax = b. Suvrit Sra Optimization for Machine Learning 26 / 36

66 Example: Slater s sufficient conditions min f 0 (x) s.t. f i (x) 0, 1 i m, Ax = b. Constraint qualification: There exists x ri D s.t. f i (x) < 0, Ax = b. That is, there is a strictly feasible point. Theorem. Let the primal problem be convex. If there is a feasible point such that is strictly feasible for the non-affine constraints (and merely feasible for affine, linear ones), then strong duality holds. Moreover, the dual optimal is attained (i.e., d > ). Reading: Read BV for a proof. Suvrit Sra Optimization for Machine Learning 26 / 36

67 Example: failure of strong duality min x,y e x x 2 /y 0, over the domain D = {(x, y) y > 0}. Suvrit Sra Optimization for Machine Learning 27 / 36

68 Example: failure of strong duality min x,y e x x 2 /y 0, over the domain D = {(x, y) y > 0}. Clearly, only feasible x = 0. So p = 1 Suvrit Sra Optimization for Machine Learning 27 / 36

69 Example: failure of strong duality min x,y e x x 2 /y 0, over the domain D = {(x, y) y > 0}. Clearly, only feasible x = 0. So p = 1 L(x, y, λ) = e x + λx 2 /y, so dual function is { g(λ) = inf x,y>0 e x + λx 2 0 λ 0 y = λ < 0. Suvrit Sra Optimization for Machine Learning 27 / 36

70 Example: failure of strong duality min x,y e x x 2 /y 0, over the domain D = {(x, y) y > 0}. Clearly, only feasible x = 0. So p = 1 L(x, y, λ) = e x + λx 2 /y, so dual function is { g(λ) = inf x,y>0 e x + λx 2 0 λ 0 y = λ < 0. Dual problem d = max 0 s.t. λ 0. λ Thus, d = 0, and gap is p d = 1. Here, we had no strictly feasible solution. Suvrit Sra Optimization for Machine Learning 27 / 36

71 Zero duality gap: nonconvex example Trust region subproblem (TRS) min x T Ax + 2b T x x T x 1. A is symmetric but not necessarily semidefinite! Theorem. TRS always has zero duality gap. Remark: Above theorem extremely important; part of family of related results for certain quadratic nonconvex problems. Suvrit Sra Optimization for Machine Learning 28 / 36

72 Example: Maxent min i x i log x i Ax b, 1 T x = 1, x > 0. Suvrit Sra Optimization for Machine Learning 29 / 36

73 Example: Maxent min i x i log x i Ax b, 1 T x = 1, x > 0. Convex conjugate of f (x) = x log x is f (y) = e y 1. Suvrit Sra Optimization for Machine Learning 29 / 36

74 Example: Maxent min i x i log x i Ax b, 1 T x = 1, x > 0. Convex conjugate of f (x) = x log x is f (y) = e y 1. max λ,ν g(λ, ν) = b T λ v n s.t. λ 0. i=1 e (AT λ) i ν 1 Suvrit Sra Optimization for Machine Learning 29 / 36

75 Example: Maxent min i x i log x i Ax b, 1 T x = 1, x > 0. Convex conjugate of f (x) = x log x is f (y) = e y 1. max λ,ν g(λ, ν) = b T λ v n s.t. λ 0. i=1 e (AT λ) i ν 1 If there is x > 0 with Ax b and 1 T x = 1, strong duality holds. Suvrit Sra Optimization for Machine Learning 29 / 36

76 Example: Maxent min i x i log x i Ax b, 1 T x = 1, x > 0. Convex conjugate of f (x) = x log x is f (y) = e y 1. max λ,ν g(λ, ν) = b T λ v n s.t. λ 0. i=1 e (AT λ) i ν 1 If there is x > 0 with Ax b and 1 T x = 1, strong duality holds. Exercise: Simplify above dual by optimizing out ν Suvrit Sra Optimization for Machine Learning 29 / 36

77 Example: dual for Support Vector Machine min x,ξ 1 2 x C i ξ i s.t. Ax 1 ξ, ξ 0. Suvrit Sra Optimization for Machine Learning 30 / 36

78 Example: dual for Support Vector Machine min x,ξ 1 2 x C i ξ i s.t. Ax 1 ξ, ξ 0. L(x, ξ, λ, ν) = 1 2 x C1T ξ λ T (Ax 1 + ξ) ν T ξ Suvrit Sra Optimization for Machine Learning 30 / 36

79 Example: dual for Support Vector Machine min x,ξ 1 2 x C i ξ i s.t. Ax 1 ξ, ξ 0. L(x, ξ, λ, ν) = 1 2 x C1T ξ λ T (Ax 1 + ξ) ν T ξ g(λ, ν) := L(x, ξ, λ, ν) inf { = λ T AT λ 2 2 λ + ν = C1 + otherwise d = max λ 0,ν 0 g(λ, ν) Exercise: Using ν 0, eliminate ν from above problem. Suvrit Sra Optimization for Machine Learning 30 / 36

80 Dual via Fenchel conjugates min f (x) s.t. f i (x) 0, Ax = b. L(x, λ, ν) := f 0 (x) + i λ if i (x) + ν T (Ax b) Suvrit Sra Optimization for Machine Learning 31 / 36

81 Dual via Fenchel conjugates min f (x) s.t. f i (x) 0, Ax = b. L(x, λ, ν) := f 0 (x) + i λ if i (x) + ν T (Ax b) g(λ, ν) = inf L(x, λ, ν) x Suvrit Sra Optimization for Machine Learning 31 / 36

82 Dual via Fenchel conjugates min f (x) s.t. f i (x) 0, Ax = b. L(x, λ, ν) := f 0 (x) + i λ if i (x) + ν T (Ax b) g(λ, ν) = inf L(x, λ, ν) x g(λ, ν) = ν T b + inf x xt A T ν + F(x) F(x) := f 0 (x) + i λ if i (x) Suvrit Sra Optimization for Machine Learning 31 / 36

83 Dual via Fenchel conjugates min f (x) s.t. f i (x) 0, Ax = b. L(x, λ, ν) := f 0 (x) + i λ if i (x) + ν T (Ax b) g(λ, ν) = inf L(x, λ, ν) x g(λ, ν) = ν T b + inf x xt A T ν + F(x) F(x) := f 0 (x) + i λ if i (x) g(λ, ν) = ν T b sup x x, A T ν F(x) Suvrit Sra Optimization for Machine Learning 31 / 36

84 Dual via Fenchel conjugates min f (x) s.t. f i (x) 0, Ax = b. L(x, λ, ν) := f 0 (x) + i λ if i (x) + ν T (Ax b) g(λ, ν) = inf L(x, λ, ν) x g(λ, ν) = ν T b + inf x xt A T ν + F(x) F(x) := f 0 (x) + i λ if i (x) g(λ, ν) = ν T b sup x g(λ, ν) = ν T b F ( A T ν). x, A T ν F(x) Not so useful! F hard to compute. Suvrit Sra Optimization for Machine Learning 31 / 36

85 Example: norm regularized problems min f (x) + Ax Suvrit Sra Optimization for Machine Learning 32 / 36

86 Example: norm regularized problems min f (x) + Ax Dual problem min y f ( A T y) s.t. y 1. Suvrit Sra Optimization for Machine Learning 32 / 36

87 Example: norm regularized problems min f (x) + Ax Dual problem min y f ( A T y) s.t. y 1. Say ȳ < 1, such that A T ȳ ri(dom f ), then we have strong duality (e.g., for instance 0 ri(dom f )) Suvrit Sra Optimization for Machine Learning 32 / 36

88 Example: Lasso-like problem p := min x Ax b 2 + λ x 1. Suvrit Sra Optimization for Machine Learning 33 / 36

89 Example: Lasso-like problem p := min x Ax b 2 + λ x 1. { } x 1 = max x T v v 1 { } x 2 = max x T u u 2 1. Suvrit Sra Optimization for Machine Learning 33 / 36

90 Example: Lasso-like problem p = min x p := min x Ax b 2 + λ x 1. { } x 1 = max x T v v 1 { } x 2 = max x T u u 2 1. max u,v Saddle-point formulation { u T (b Ax) + v T x u 2 1, v λ } Suvrit Sra Optimization for Machine Learning 33 / 36

91 Example: Lasso-like problem p = min x = max u,v p := min x Ax b 2 + λ x 1. { } x 1 = max x T v v 1 { } x 2 = max x T u u 2 1. max u,v min x Saddle-point formulation { } u T (b Ax) + v T x u 2 1, v λ { } u T (b Ax) + x T v u 2 1, v λ Suvrit Sra Optimization for Machine Learning 33 / 36

92 Example: Lasso-like problem p = min x = max u,v p := min x Ax b 2 + λ x 1. { } x 1 = max x T v v 1 { } x 2 = max x T u u 2 1. max u,v min x Saddle-point formulation { } u T (b Ax) + v T x u 2 1, v λ { } u T (b Ax) + x T v u 2 1, v λ = max u,v ut b A T u = v, u 2 1, v λ Suvrit Sra Optimization for Machine Learning 33 / 36

93 Example: Lasso-like problem p = min x = max u,v p := min x Ax b 2 + λ x 1. { } x 1 = max x T v v 1 { } x 2 = max x T u u 2 1. max u,v min x Saddle-point formulation { } u T (b Ax) + v T x u 2 1, v λ { } u T (b Ax) + x T v u 2 1, v λ = max u,v ut b A T u = v, u 2 1, v λ = max u T b u 2 1, A T v λ. u Suvrit Sra Optimization for Machine Learning 33 / 36

94 Example: KKT conditions min f 0 (x) f i (x) 0, i = 1,..., m. Suvrit Sra Optimization for Machine Learning 34 / 36

95 Example: KKT conditions min f 0 (x) f i (x) 0, i = 1,..., m. Recall: f 0 (x ), x x 0 for all feasible x X Suvrit Sra Optimization for Machine Learning 34 / 36

96 Example: KKT conditions min f 0 (x) f i (x) 0, i = 1,..., m. Recall: f 0 (x ), x x 0 for all feasible x X Can we simplify this using Lagrangian? g(λ) = inf x L(x, λ) := f 0 (x) + i λ if i (x) Suvrit Sra Optimization for Machine Learning 34 / 36

97 Example: KKT conditions min f 0 (x) f i (x) 0, i = 1,..., m. Recall: f 0 (x ), x x 0 for all feasible x X Can we simplify this using Lagrangian? g(λ) = inf x L(x, λ) := f 0 (x) + i λ if i (x) Assume strong duality; and both p and d attained! Suvrit Sra Optimization for Machine Learning 34 / 36

98 Example: KKT conditions min f 0 (x) f i (x) 0, i = 1,..., m. Recall: f 0 (x ), x x 0 for all feasible x X Can we simplify this using Lagrangian? g(λ) = inf x L(x, λ) := f 0 (x) + i λ if i (x) Assume strong duality; and both p and d attained! Thus, there exists a pair (x, λ ) such that p = f 0 (x ) Suvrit Sra Optimization for Machine Learning 34 / 36

99 Example: KKT conditions min f 0 (x) f i (x) 0, i = 1,..., m. Recall: f 0 (x ), x x 0 for all feasible x X Can we simplify this using Lagrangian? g(λ) = inf x L(x, λ) := f 0 (x) + i λ if i (x) Assume strong duality; and both p and d attained! Thus, there exists a pair (x, λ ) such that p = f 0 (x ) = d = g(λ ) Suvrit Sra Optimization for Machine Learning 34 / 36

100 Example: KKT conditions min f 0 (x) f i (x) 0, i = 1,..., m. Recall: f 0 (x ), x x 0 for all feasible x X Can we simplify this using Lagrangian? g(λ) = inf x L(x, λ) := f 0 (x) + i λ if i (x) Assume strong duality; and both p and d attained! Thus, there exists a pair (x, λ ) such that p = f 0 (x ) = d = g(λ ) = min x L(x, λ ) Suvrit Sra Optimization for Machine Learning 34 / 36

101 Example: KKT conditions min f 0 (x) f i (x) 0, i = 1,..., m. Recall: f 0 (x ), x x 0 for all feasible x X Can we simplify this using Lagrangian? g(λ) = inf x L(x, λ) := f 0 (x) + i λ if i (x) Assume strong duality; and both p and d attained! Thus, there exists a pair (x, λ ) such that p = f 0 (x ) = d = g(λ ) = min x L(x, λ ) L(x, λ ) Suvrit Sra Optimization for Machine Learning 34 / 36

102 Example: KKT conditions min f 0 (x) f i (x) 0, i = 1,..., m. Recall: f 0 (x ), x x 0 for all feasible x X Can we simplify this using Lagrangian? g(λ) = inf x L(x, λ) := f 0 (x) + i λ if i (x) Assume strong duality; and both p and d attained! Thus, there exists a pair (x, λ ) such that p = f 0 (x ) = d = g(λ ) = min L(x, λ ) L(x, λ ) f 0 (x ) = p x Suvrit Sra Optimization for Machine Learning 34 / 36

103 Example: KKT conditions min f 0 (x) f i (x) 0, i = 1,..., m. Recall: f 0 (x ), x x 0 for all feasible x X Can we simplify this using Lagrangian? g(λ) = inf x L(x, λ) := f 0 (x) + i λ if i (x) Assume strong duality; and both p and d attained! Thus, there exists a pair (x, λ ) such that p = f 0 (x ) = d = g(λ ) = min L(x, λ ) L(x, λ ) f 0 (x ) = p x Thus, equalities hold in above chain. Suvrit Sra Optimization for Machine Learning 34 / 36

104 Example: KKT conditions min f 0 (x) f i (x) 0, i = 1,..., m. Recall: f 0 (x ), x x 0 for all feasible x X Can we simplify this using Lagrangian? g(λ) = inf x L(x, λ) := f 0 (x) + i λ if i (x) Assume strong duality; and both p and d attained! Thus, there exists a pair (x, λ ) such that p = f 0 (x ) = d = g(λ ) = min L(x, λ ) L(x, λ ) f 0 (x ) = p x Thus, equalities hold in above chain. x argmin x L(x, λ ). Suvrit Sra Optimization for Machine Learning 34 / 36

105 Example: KKT conditions x argmin x L(x, λ ). If f 0, f 1,..., f m are differentiable, this implies Suvrit Sra Optimization for Machine Learning 35 / 36

106 Example: KKT conditions x argmin x L(x, λ ). If f 0, f 1,..., f m are differentiable, this implies x L(x, λ ) x=x = f 0 (x ) + i λ i f i(x ) = 0. Suvrit Sra Optimization for Machine Learning 35 / 36

107 Example: KKT conditions x argmin x L(x, λ ). If f 0, f 1,..., f m are differentiable, this implies x L(x, λ ) x=x = f 0 (x ) + i λ i f i(x ) = 0. Moreover, since L(x, λ ) = f 0 (x ), we also have Suvrit Sra Optimization for Machine Learning 35 / 36

108 Example: KKT conditions x argmin x L(x, λ ). If f 0, f 1,..., f m are differentiable, this implies x L(x, λ ) x=x = f 0 (x ) + i λ i f i(x ) = 0. Moreover, since L(x, λ ) = f 0 (x ), we also have i λ i f i(x ) = 0. Suvrit Sra Optimization for Machine Learning 35 / 36

109 Example: KKT conditions x argmin x L(x, λ ). If f 0, f 1,..., f m are differentiable, this implies x L(x, λ ) x=x = f 0 (x ) + i λ i f i(x ) = 0. Moreover, since L(x, λ ) = f 0 (x ), we also have i λ i f i(x ) = 0. But λ i 0 and f i (x ) 0, Suvrit Sra Optimization for Machine Learning 35 / 36

110 Example: KKT conditions x argmin x L(x, λ ). If f 0, f 1,..., f m are differentiable, this implies x L(x, λ ) x=x = f 0 (x ) + i λ i f i(x ) = 0. Moreover, since L(x, λ ) = f 0 (x ), we also have i λ i f i(x ) = 0. But λ i 0 and f i (x ) 0, so complementary slackness λ i f i(x ) = 0, i = 1,..., m. Suvrit Sra Optimization for Machine Learning 35 / 36

111 KKT conditions f i (x ) 0, i = 1,..., m (primal feasibility) λ i 0, i = 1,..., m (dual feasibility) λ i f i(x ) = 0, i = 1,..., m (compl. slackness) x L(x, λ ) x=x = 0 (Lagrangian stationarity) Suvrit Sra Optimization for Machine Learning 36 / 36

112 KKT conditions f i (x ) 0, i = 1,..., m (primal feasibility) λ i 0, i = 1,..., m (dual feasibility) λ i f i(x ) = 0, i = 1,..., m (compl. slackness) x L(x, λ ) x=x = 0 (Lagrangian stationarity) We showed: if strong duality holds, and (x, λ ) exist, then KKT conditions are necessary for pair (x, λ ) to be optimal Suvrit Sra Optimization for Machine Learning 36 / 36

113 KKT conditions f i (x ) 0, i = 1,..., m (primal feasibility) λ i 0, i = 1,..., m (dual feasibility) λ i f i(x ) = 0, i = 1,..., m (compl. slackness) x L(x, λ ) x=x = 0 (Lagrangian stationarity) We showed: if strong duality holds, and (x, λ ) exist, then KKT conditions are necessary for pair (x, λ ) to be optimal If problem is convex, then KKT also sufficient Suvrit Sra Optimization for Machine Learning 36 / 36

114 KKT conditions f i (x ) 0, i = 1,..., m (primal feasibility) λ i 0, i = 1,..., m (dual feasibility) λ i f i(x ) = 0, i = 1,..., m (compl. slackness) x L(x, λ ) x=x = 0 (Lagrangian stationarity) We showed: if strong duality holds, and (x, λ ) exist, then KKT conditions are necessary for pair (x, λ ) to be optimal If problem is convex, then KKT also sufficient Exercise: Prove the above sufficiency of KKT. Hint: Use that L(x, λ ) is convex, and conclude from KKT conditions that g(λ ) = f 0 (x ), so that (x, λ ) optimal primal-dual pair. Suvrit Sra Optimization for Machine Learning 36 / 36

Convex Optimization & Lagrange Duality

Convex Optimization & Lagrange Duality Convex Optimization & Lagrange Duality Chee Wei Tan CS 8292 : Advanced Topics in Convex Optimization and its Applications Fall 2010 Outline Convex optimization Optimality condition Lagrange duality KKT

More information

5. Duality. Lagrangian

5. Duality. Lagrangian 5. Duality Convex Optimization Boyd & Vandenberghe Lagrange dual problem weak and strong duality geometric interpretation optimality conditions perturbation and sensitivity analysis examples generalized

More information

Convex Optimization Boyd & Vandenberghe. 5. Duality

Convex Optimization Boyd & Vandenberghe. 5. Duality 5. Duality Convex Optimization Boyd & Vandenberghe Lagrange dual problem weak and strong duality geometric interpretation optimality conditions perturbation and sensitivity analysis examples generalized

More information

Lagrange duality. The Lagrangian. We consider an optimization program of the form

Lagrange duality. The Lagrangian. We consider an optimization program of the form Lagrange duality Another way to arrive at the KKT conditions, and one which gives us some insight on solving constrained optimization problems, is through the Lagrange dual. The dual is a maximization

More information

Convex Optimization. Dani Yogatama. School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA. February 12, 2014

Convex Optimization. Dani Yogatama. School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA. February 12, 2014 Convex Optimization Dani Yogatama School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA February 12, 2014 Dani Yogatama (Carnegie Mellon University) Convex Optimization February 12,

More information

Optimization for Machine Learning

Optimization for Machine Learning Optimization for Machine Learning (Lecture 3-A - Convex) SUVRIT SRA Massachusetts Institute of Technology Special thanks: Francis Bach (INRIA, ENS) (for sharing this material, and permitting its use) MPI-IS

More information

Convex Optimization M2

Convex Optimization M2 Convex Optimization M2 Lecture 3 A. d Aspremont. Convex Optimization M2. 1/49 Duality A. d Aspremont. Convex Optimization M2. 2/49 DMs DM par email: dm.daspremont@gmail.com A. d Aspremont. Convex Optimization

More information

Lecture: Duality of LP, SOCP and SDP

Lecture: Duality of LP, SOCP and SDP 1/33 Lecture: Duality of LP, SOCP and SDP Zaiwen Wen Beijing International Center For Mathematical Research Peking University http://bicmr.pku.edu.cn/~wenzw/bigdata2017.html wenzw@pku.edu.cn Acknowledgement:

More information

Lecture: Duality.

Lecture: Duality. Lecture: Duality http://bicmr.pku.edu.cn/~wenzw/opt-2016-fall.html Acknowledgement: this slides is based on Prof. Lieven Vandenberghe s lecture notes Introduction 2/35 Lagrange dual problem weak and strong

More information

14. Duality. ˆ Upper and lower bounds. ˆ General duality. ˆ Constraint qualifications. ˆ Counterexample. ˆ Complementary slackness.

14. Duality. ˆ Upper and lower bounds. ˆ General duality. ˆ Constraint qualifications. ˆ Counterexample. ˆ Complementary slackness. CS/ECE/ISyE 524 Introduction to Optimization Spring 2016 17 14. Duality ˆ Upper and lower bounds ˆ General duality ˆ Constraint qualifications ˆ Counterexample ˆ Complementary slackness ˆ Examples ˆ Sensitivity

More information

Constrained Optimization and Lagrangian Duality

Constrained Optimization and Lagrangian Duality CIS 520: Machine Learning Oct 02, 2017 Constrained Optimization and Lagrangian Duality Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture. They may or may

More information

Extreme Abridgment of Boyd and Vandenberghe s Convex Optimization

Extreme Abridgment of Boyd and Vandenberghe s Convex Optimization Extreme Abridgment of Boyd and Vandenberghe s Convex Optimization Compiled by David Rosenberg Abstract Boyd and Vandenberghe s Convex Optimization book is very well-written and a pleasure to read. The

More information

EE/AA 578, Univ of Washington, Fall Duality

EE/AA 578, Univ of Washington, Fall Duality 7. Duality EE/AA 578, Univ of Washington, Fall 2016 Lagrange dual problem weak and strong duality geometric interpretation optimality conditions perturbation and sensitivity analysis examples generalized

More information

Lagrangian Duality and Convex Optimization

Lagrangian Duality and Convex Optimization Lagrangian Duality and Convex Optimization David Rosenberg New York University February 11, 2015 David Rosenberg (New York University) DS-GA 1003 February 11, 2015 1 / 24 Introduction Why Convex Optimization?

More information

Lecture 6: Conic Optimization September 8

Lecture 6: Conic Optimization September 8 IE 598: Big Data Optimization Fall 2016 Lecture 6: Conic Optimization September 8 Lecturer: Niao He Scriber: Juan Xu Overview In this lecture, we finish up our previous discussion on optimality conditions

More information

Duality. Lagrange dual problem weak and strong duality optimality conditions perturbation and sensitivity analysis generalized inequalities

Duality. Lagrange dual problem weak and strong duality optimality conditions perturbation and sensitivity analysis generalized inequalities Duality Lagrange dual problem weak and strong duality optimality conditions perturbation and sensitivity analysis generalized inequalities Lagrangian Consider the optimization problem in standard form

More information

Duality Uses and Correspondences. Ryan Tibshirani Convex Optimization

Duality Uses and Correspondences. Ryan Tibshirani Convex Optimization Duality Uses and Correspondences Ryan Tibshirani Conve Optimization 10-725 Recall that for the problem Last time: KKT conditions subject to f() h i () 0, i = 1,... m l j () = 0, j = 1,... r the KKT conditions

More information

Motivation. Lecture 2 Topics from Optimization and Duality. network utility maximization (NUM) problem:

Motivation. Lecture 2 Topics from Optimization and Duality. network utility maximization (NUM) problem: CDS270 Maryam Fazel Lecture 2 Topics from Optimization and Duality Motivation network utility maximization (NUM) problem: consider a network with S sources (users), each sending one flow at rate x s, through

More information

Karush-Kuhn-Tucker Conditions. Lecturer: Ryan Tibshirani Convex Optimization /36-725

Karush-Kuhn-Tucker Conditions. Lecturer: Ryan Tibshirani Convex Optimization /36-725 Karush-Kuhn-Tucker Conditions Lecturer: Ryan Tibshirani Convex Optimization 10-725/36-725 1 Given a minimization problem Last time: duality min x subject to f(x) h i (x) 0, i = 1,... m l j (x) = 0, j =

More information

I.3. LMI DUALITY. Didier HENRION EECI Graduate School on Control Supélec - Spring 2010

I.3. LMI DUALITY. Didier HENRION EECI Graduate School on Control Supélec - Spring 2010 I.3. LMI DUALITY Didier HENRION henrion@laas.fr EECI Graduate School on Control Supélec - Spring 2010 Primal and dual For primal problem p = inf x g 0 (x) s.t. g i (x) 0 define Lagrangian L(x, z) = g 0

More information

Subgradient. Acknowledgement: this slides is based on Prof. Lieven Vandenberghes lecture notes. definition. subgradient calculus

Subgradient. Acknowledgement: this slides is based on Prof. Lieven Vandenberghes lecture notes. definition. subgradient calculus 1/41 Subgradient Acknowledgement: this slides is based on Prof. Lieven Vandenberghes lecture notes definition subgradient calculus duality and optimality conditions directional derivative Basic inequality

More information

4TE3/6TE3. Algorithms for. Continuous Optimization

4TE3/6TE3. Algorithms for. Continuous Optimization 4TE3/6TE3 Algorithms for Continuous Optimization (Duality in Nonlinear Optimization ) Tamás TERLAKY Computing and Software McMaster University Hamilton, January 2004 terlaky@mcmaster.ca Tel: 27780 Optimality

More information

ICS-E4030 Kernel Methods in Machine Learning

ICS-E4030 Kernel Methods in Machine Learning ICS-E4030 Kernel Methods in Machine Learning Lecture 3: Convex optimization and duality Juho Rousu 28. September, 2016 Juho Rousu 28. September, 2016 1 / 38 Convex optimization Convex optimisation This

More information

Conditional Gradient (Frank-Wolfe) Method

Conditional Gradient (Frank-Wolfe) Method Conditional Gradient (Frank-Wolfe) Method Lecturer: Aarti Singh Co-instructor: Pradeep Ravikumar Convex Optimization 10-725/36-725 1 Outline Today: Conditional gradient method Convergence analysis Properties

More information

arxiv: v1 [math.oc] 1 Jul 2016

arxiv: v1 [math.oc] 1 Jul 2016 Convergence Rate of Frank-Wolfe for Non-Convex Objectives Simon Lacoste-Julien INRIA - SIERRA team ENS, Paris June 8, 016 Abstract arxiv:1607.00345v1 [math.oc] 1 Jul 016 We give a simple proof that the

More information

CS-E4830 Kernel Methods in Machine Learning

CS-E4830 Kernel Methods in Machine Learning CS-E4830 Kernel Methods in Machine Learning Lecture 3: Convex optimization and duality Juho Rousu 27. September, 2017 Juho Rousu 27. September, 2017 1 / 45 Convex optimization Convex optimisation This

More information

Lagrange Duality. Daniel P. Palomar. Hong Kong University of Science and Technology (HKUST)

Lagrange Duality. Daniel P. Palomar. Hong Kong University of Science and Technology (HKUST) Lagrange Duality Daniel P. Palomar Hong Kong University of Science and Technology (HKUST) ELEC5470 - Convex Optimization Fall 2017-18, HKUST, Hong Kong Outline of Lecture Lagrangian Dual function Dual

More information

Shiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 4. Subgradient

Shiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 4. Subgradient Shiqian Ma, MAT-258A: Numerical Optimization 1 Chapter 4 Subgradient Shiqian Ma, MAT-258A: Numerical Optimization 2 4.1. Subgradients definition subgradient calculus duality and optimality conditions Shiqian

More information

Convex Optimization. (EE227A: UC Berkeley) Lecture 15. Suvrit Sra. (Gradient methods III) 12 March, 2013

Convex Optimization. (EE227A: UC Berkeley) Lecture 15. Suvrit Sra. (Gradient methods III) 12 March, 2013 Convex Optimization (EE227A: UC Berkeley) Lecture 15 (Gradient methods III) 12 March, 2013 Suvrit Sra Optimal gradient methods 2 / 27 Optimal gradient methods We saw following efficiency estimates for

More information

Primal/Dual Decomposition Methods

Primal/Dual Decomposition Methods Primal/Dual Decomposition Methods Daniel P. Palomar Hong Kong University of Science and Technology (HKUST) ELEC5470 - Convex Optimization Fall 2018-19, HKUST, Hong Kong Outline of Lecture Subgradients

More information

Convex Optimization. (EE227A: UC Berkeley) Lecture 4. Suvrit Sra. (Conjugates, subdifferentials) 31 Jan, 2013

Convex Optimization. (EE227A: UC Berkeley) Lecture 4. Suvrit Sra. (Conjugates, subdifferentials) 31 Jan, 2013 Convex Optimization (EE227A: UC Berkeley) Lecture 4 (Conjugates, subdifferentials) 31 Jan, 2013 Suvrit Sra Organizational HW1 due: 14th Feb 2013 in class. Please L A TEX your solutions (contact TA if this

More information

Introduction to Machine Learning Lecture 7. Mehryar Mohri Courant Institute and Google Research

Introduction to Machine Learning Lecture 7. Mehryar Mohri Courant Institute and Google Research Introduction to Machine Learning Lecture 7 Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu Convex Optimization Differentiation Definition: let f : X R N R be a differentiable function,

More information

The Lagrangian L : R d R m R r R is an (easier to optimize) lower bound on the original problem:

The Lagrangian L : R d R m R r R is an (easier to optimize) lower bound on the original problem: HT05: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford Convex Optimization and slides based on Arthur Gretton s Advanced Topics in Machine Learning course

More information

Lecture Note 5: Semidefinite Programming for Stability Analysis

Lecture Note 5: Semidefinite Programming for Stability Analysis ECE7850: Hybrid Systems:Theory and Applications Lecture Note 5: Semidefinite Programming for Stability Analysis Wei Zhang Assistant Professor Department of Electrical and Computer Engineering Ohio State

More information

Applications of Linear Programming

Applications of Linear Programming Applications of Linear Programming lecturer: András London University of Szeged Institute of Informatics Department of Computational Optimization Lecture 9 Non-linear programming In case of LP, the goal

More information

Frank-Wolfe Method. Ryan Tibshirani Convex Optimization

Frank-Wolfe Method. Ryan Tibshirani Convex Optimization Frank-Wolfe Method Ryan Tibshirani Convex Optimization 10-725 Last time: ADMM For the problem min x,z f(x) + g(z) subject to Ax + Bz = c we form augmented Lagrangian (scaled form): L ρ (x, z, w) = f(x)

More information

Subgradients. subgradients and quasigradients. subgradient calculus. optimality conditions via subgradients. directional derivatives

Subgradients. subgradients and quasigradients. subgradient calculus. optimality conditions via subgradients. directional derivatives Subgradients subgradients and quasigradients subgradient calculus optimality conditions via subgradients directional derivatives Prof. S. Boyd, EE392o, Stanford University Basic inequality recall basic

More information

Optimality Conditions for Constrained Optimization

Optimality Conditions for Constrained Optimization 72 CHAPTER 7 Optimality Conditions for Constrained Optimization 1. First Order Conditions In this section we consider first order optimality conditions for the constrained problem P : minimize f 0 (x)

More information

Duality. Geoff Gordon & Ryan Tibshirani Optimization /

Duality. Geoff Gordon & Ryan Tibshirani Optimization / Duality Geoff Gordon & Ryan Tibshirani Optimization 10-725 / 36-725 1 Duality in linear programs Suppose we want to find lower bound on the optimal value in our convex problem, B min x C f(x) E.g., consider

More information

CSCI : Optimization and Control of Networks. Review on Convex Optimization

CSCI : Optimization and Control of Networks. Review on Convex Optimization CSCI7000-016: Optimization and Control of Networks Review on Convex Optimization 1 Convex set S R n is convex if x,y S, λ,µ 0, λ+µ = 1 λx+µy S geometrically: x,y S line segment through x,y S examples (one

More information

subject to (x 2)(x 4) u,

subject to (x 2)(x 4) u, Exercises Basic definitions 5.1 A simple example. Consider the optimization problem with variable x R. minimize x 2 + 1 subject to (x 2)(x 4) 0, (a) Analysis of primal problem. Give the feasible set, the

More information

Newton s Method. Javier Peña Convex Optimization /36-725

Newton s Method. Javier Peña Convex Optimization /36-725 Newton s Method Javier Peña Convex Optimization 10-725/36-725 1 Last time: dual correspondences Given a function f : R n R, we define its conjugate f : R n R, f ( (y) = max y T x f(x) ) x Properties and

More information

On the Method of Lagrange Multipliers

On the Method of Lagrange Multipliers On the Method of Lagrange Multipliers Reza Nasiri Mahalati November 6, 2016 Most of what is in this note is taken from the Convex Optimization book by Stephen Boyd and Lieven Vandenberghe. This should

More information

A Brief Review on Convex Optimization

A Brief Review on Convex Optimization A Brief Review on Convex Optimization 1 Convex set S R n is convex if x,y S, λ,µ 0, λ+µ = 1 λx+µy S geometrically: x,y S line segment through x,y S examples (one convex, two nonconvex sets): A Brief Review

More information

Lecture 7: Convex Optimizations

Lecture 7: Convex Optimizations Lecture 7: Convex Optimizations Radu Balan, David Levermore March 29, 2018 Convex Sets. Convex Functions A set S R n is called a convex set if for any points x, y S the line segment [x, y] := {tx + (1

More information

Duality. Barnabas Poczos & Ryan Tibshirani Convex Optimization /36-725

Duality. Barnabas Poczos & Ryan Tibshirani Convex Optimization /36-725 Duality Barnabas Poczos & Ryan Tibshirani Convex Optimization 10-725/36-725 1 Duality in linear programs Suppose we want to find lower bound on the optimal value in our convex problem, B min x C f(x) E.g.,

More information

Lecture 3: Lagrangian duality and algorithms for the Lagrangian dual problem

Lecture 3: Lagrangian duality and algorithms for the Lagrangian dual problem Lecture 3: Lagrangian duality and algorithms for the Lagrangian dual problem Michael Patriksson 0-0 The Relaxation Theorem 1 Problem: find f := infimum f(x), x subject to x S, (1a) (1b) where f : R n R

More information

LECTURE 25: REVIEW/EPILOGUE LECTURE OUTLINE

LECTURE 25: REVIEW/EPILOGUE LECTURE OUTLINE LECTURE 25: REVIEW/EPILOGUE LECTURE OUTLINE CONVEX ANALYSIS AND DUALITY Basic concepts of convex analysis Basic concepts of convex optimization Geometric duality framework - MC/MC Constrained optimization

More information

Lecture 18: Optimization Programming

Lecture 18: Optimization Programming Fall, 2016 Outline Unconstrained Optimization 1 Unconstrained Optimization 2 Equality-constrained Optimization Inequality-constrained Optimization Mixture-constrained Optimization 3 Quadratic Programming

More information

Subgradients. subgradients. strong and weak subgradient calculus. optimality conditions via subgradients. directional derivatives

Subgradients. subgradients. strong and weak subgradient calculus. optimality conditions via subgradients. directional derivatives Subgradients subgradients strong and weak subgradient calculus optimality conditions via subgradients directional derivatives Prof. S. Boyd, EE364b, Stanford University Basic inequality recall basic inequality

More information

EE 227A: Convex Optimization and Applications October 14, 2008

EE 227A: Convex Optimization and Applications October 14, 2008 EE 227A: Convex Optimization and Applications October 14, 2008 Lecture 13: SDP Duality Lecturer: Laurent El Ghaoui Reading assignment: Chapter 5 of BV. 13.1 Direct approach 13.1.1 Primal problem Consider

More information

Support Vector Machines

Support Vector Machines Support Vector Machines Support vector machines (SVMs) are one of the central concepts in all of machine learning. They are simply a combination of two ideas: linear classification via maximum (or optimal

More information

Convex Optimization. Newton s method. ENSAE: Optimisation 1/44

Convex Optimization. Newton s method. ENSAE: Optimisation 1/44 Convex Optimization Newton s method ENSAE: Optimisation 1/44 Unconstrained minimization minimize f(x) f convex, twice continuously differentiable (hence dom f open) we assume optimal value p = inf x f(x)

More information

Homework Set #6 - Solutions

Homework Set #6 - Solutions EE 15 - Applications of Convex Optimization in Signal Processing and Communications Dr Andre Tkacenko JPL Third Term 11-1 Homework Set #6 - Solutions 1 a The feasible set is the interval [ 4] The unique

More information

Quiz Discussion. IE417: Nonlinear Programming: Lecture 12. Motivation. Why do we care? Jeff Linderoth. 16th March 2006

Quiz Discussion. IE417: Nonlinear Programming: Lecture 12. Motivation. Why do we care? Jeff Linderoth. 16th March 2006 Quiz Discussion IE417: Nonlinear Programming: Lecture 12 Jeff Linderoth Department of Industrial and Systems Engineering Lehigh University 16th March 2006 Motivation Why do we care? We are interested in

More information

Primal-dual Subgradient Method for Convex Problems with Functional Constraints

Primal-dual Subgradient Method for Convex Problems with Functional Constraints Primal-dual Subgradient Method for Convex Problems with Functional Constraints Yurii Nesterov, CORE/INMA (UCL) Workshop on embedded optimization EMBOPT2014 September 9, 2014 (Lucca) Yu. Nesterov Primal-dual

More information

Linear and non-linear programming

Linear and non-linear programming Linear and non-linear programming Benjamin Recht March 11, 2005 The Gameplan Constrained Optimization Convexity Duality Applications/Taxonomy 1 Constrained Optimization minimize f(x) subject to g j (x)

More information

Math 273a: Optimization Subgradients of convex functions

Math 273a: Optimization Subgradients of convex functions Math 273a: Optimization Subgradients of convex functions Made by: Damek Davis Edited by Wotao Yin Department of Mathematics, UCLA Fall 2015 online discussions on piazza.com 1 / 42 Subgradients Assumptions

More information

Chap 2. Optimality conditions

Chap 2. Optimality conditions Chap 2. Optimality conditions Version: 29-09-2012 2.1 Optimality conditions in unconstrained optimization Recall the definitions of global, local minimizer. Geometry of minimization Consider for f C 1

More information

Lecture 8. Strong Duality Results. September 22, 2008

Lecture 8. Strong Duality Results. September 22, 2008 Strong Duality Results September 22, 2008 Outline Lecture 8 Slater Condition and its Variations Convex Objective with Linear Inequality Constraints Quadratic Objective over Quadratic Constraints Representation

More information

Convex Optimization. Ofer Meshi. Lecture 6: Lower Bounds Constrained Optimization

Convex Optimization. Ofer Meshi. Lecture 6: Lower Bounds Constrained Optimization Convex Optimization Ofer Meshi Lecture 6: Lower Bounds Constrained Optimization Lower Bounds Some upper bounds: #iter μ 2 M #iter 2 M #iter L L μ 2 Oracle/ops GD κ log 1/ε M x # ε L # x # L # ε # με f

More information

Additional Homework Problems

Additional Homework Problems Additional Homework Problems Robert M. Freund April, 2004 2004 Massachusetts Institute of Technology. 1 2 1 Exercises 1. Let IR n + denote the nonnegative orthant, namely IR + n = {x IR n x j ( ) 0,j =1,...,n}.

More information

Nonlinear Programming

Nonlinear Programming Nonlinear Programming Kees Roos e-mail: C.Roos@ewi.tudelft.nl URL: http://www.isa.ewi.tudelft.nl/ roos LNMB Course De Uithof, Utrecht February 6 - May 8, A.D. 2006 Optimization Group 1 Outline for week

More information

Lagrange Relaxation and Duality

Lagrange Relaxation and Duality Lagrange Relaxation and Duality As we have already known, constrained optimization problems are harder to solve than unconstrained problems. By relaxation we can solve a more difficult problem by a simpler

More information

Uses of duality. Geoff Gordon & Ryan Tibshirani Optimization /

Uses of duality. Geoff Gordon & Ryan Tibshirani Optimization / Uses of duality Geoff Gordon & Ryan Tibshirani Optimization 10-725 / 36-725 1 Remember conjugate functions Given f : R n R, the function is called its conjugate f (y) = max x R n yt x f(x) Conjugates appear

More information

Convex Optimization and SVM

Convex Optimization and SVM Convex Optimization and SVM Problem 0. Cf lecture notes pages 12 to 18. Problem 1. (i) A slab is an intersection of two half spaces, hence convex. (ii) A wedge is an intersection of two half spaces, hence

More information

Duality in Linear Programs. Ryan Tibshirani Convex Optimization

Duality in Linear Programs. Ryan Tibshirani Convex Optimization Duality in Linear Programs Ryan Tibshirani Convex Optimization 10-725 Consider Last time: stochastic gradient descent 1 x m m f i (x) i=1 Stochastic gradient descent or SGD: let x (0) R n, repeat: x (k)

More information

minimize x subject to (x 2)(x 4) u,

minimize x subject to (x 2)(x 4) u, Math 6366/6367: Optimization and Variational Methods Sample Preliminary Exam Questions 1. Suppose that f : [, L] R is a C 2 -function with f () on (, L) and that you have explicit formulae for

More information

Lecture 14: Optimality Conditions for Conic Problems

Lecture 14: Optimality Conditions for Conic Problems EE 227A: Conve Optimization and Applications March 6, 2012 Lecture 14: Optimality Conditions for Conic Problems Lecturer: Laurent El Ghaoui Reading assignment: 5.5 of BV. 14.1 Optimality for Conic Problems

More information

Convex Optimization and Modeling

Convex Optimization and Modeling Convex Optimization and Modeling Duality Theory and Optimality Conditions 5th lecture, 12.05.2010 Jun.-Prof. Matthias Hein Program of today/next lecture Lagrangian and duality: the Lagrangian the dual

More information

Enhanced Fritz John Optimality Conditions and Sensitivity Analysis

Enhanced Fritz John Optimality Conditions and Sensitivity Analysis Enhanced Fritz John Optimality Conditions and Sensitivity Analysis Dimitri P. Bertsekas Laboratory for Information and Decision Systems Massachusetts Institute of Technology March 2016 1 / 27 Constrained

More information

E5295/5B5749 Convex optimization with engineering applications. Lecture 5. Convex programming and semidefinite programming

E5295/5B5749 Convex optimization with engineering applications. Lecture 5. Convex programming and semidefinite programming E5295/5B5749 Convex optimization with engineering applications Lecture 5 Convex programming and semidefinite programming A. Forsgren, KTH 1 Lecture 5 Convex optimization 2006/2007 Convex quadratic program

More information

Duality in Linear Programs. Lecturer: Ryan Tibshirani Convex Optimization /36-725

Duality in Linear Programs. Lecturer: Ryan Tibshirani Convex Optimization /36-725 Duality in Linear Programs Lecturer: Ryan Tibshirani Convex Optimization 10-725/36-725 1 Last time: proximal gradient descent Consider the problem x g(x) + h(x) with g, h convex, g differentiable, and

More information

Lecture 5. The Dual Cone and Dual Problem

Lecture 5. The Dual Cone and Dual Problem IE 8534 1 Lecture 5. The Dual Cone and Dual Problem IE 8534 2 For a convex cone K, its dual cone is defined as K = {y x, y 0, x K}. The inner-product can be replaced by x T y if the coordinates of the

More information

Numerical Optimization

Numerical Optimization Constrained Optimization Computer Science and Automation Indian Institute of Science Bangalore 560 012, India. NPTEL Course on Constrained Optimization Constrained Optimization Problem: min h j (x) 0,

More information

Convex Optimization Theory. Chapter 5 Exercises and Solutions: Extended Version

Convex Optimization Theory. Chapter 5 Exercises and Solutions: Extended Version Convex Optimization Theory Chapter 5 Exercises and Solutions: Extended Version Dimitri P. Bertsekas Massachusetts Institute of Technology Athena Scientific, Belmont, Massachusetts http://www.athenasc.com

More information

Interior Point Algorithms for Constrained Convex Optimization

Interior Point Algorithms for Constrained Convex Optimization Interior Point Algorithms for Constrained Convex Optimization Chee Wei Tan CS 8292 : Advanced Topics in Convex Optimization and its Applications Fall 2010 Outline Inequality constrained minimization problems

More information

Newton s Method. Ryan Tibshirani Convex Optimization /36-725

Newton s Method. Ryan Tibshirani Convex Optimization /36-725 Newton s Method Ryan Tibshirani Convex Optimization 10-725/36-725 1 Last time: dual correspondences Given a function f : R n R, we define its conjugate f : R n R, Properties and examples: f (y) = max x

More information

Lagrangian Duality Theory

Lagrangian Duality Theory Lagrangian Duality Theory Yinyu Ye Department of Management Science and Engineering Stanford University Stanford, CA 94305, U.S.A. http://www.stanford.edu/ yyye Chapter 14.1-4 1 Recall Primal and Dual

More information

Tutorial on Convex Optimization: Part II

Tutorial on Convex Optimization: Part II Tutorial on Convex Optimization: Part II Dr. Khaled Ardah Communications Research Laboratory TU Ilmenau Dec. 18, 2018 Outline Convex Optimization Review Lagrangian Duality Applications Optimal Power Allocation

More information

12. Interior-point methods

12. Interior-point methods 12. Interior-point methods Convex Optimization Boyd & Vandenberghe inequality constrained minimization logarithmic barrier function and central path barrier method feasibility and phase I methods complexity

More information

8. Conjugate functions

8. Conjugate functions L. Vandenberghe EE236C (Spring 2013-14) 8. Conjugate functions closed functions conjugate function 8-1 Closed set a set C is closed if it contains its boundary: x k C, x k x = x C operations that preserve

More information

Convex Optimization. (EE227A: UC Berkeley) Lecture 6. Suvrit Sra. (Conic optimization) 07 Feb, 2013

Convex Optimization. (EE227A: UC Berkeley) Lecture 6. Suvrit Sra. (Conic optimization) 07 Feb, 2013 Convex Optimization (EE227A: UC Berkeley) Lecture 6 (Conic optimization) 07 Feb, 2013 Suvrit Sra Organizational Info Quiz coming up on 19th Feb. Project teams by 19th Feb Good if you can mix your research

More information

Lecture 2: Convex Sets and Functions

Lecture 2: Convex Sets and Functions Lecture 2: Convex Sets and Functions Hyang-Won Lee Dept. of Internet & Multimedia Eng. Konkuk University Lecture 2 Network Optimization, Fall 2015 1 / 22 Optimization Problems Optimization problems are

More information

Lecture 23: November 19

Lecture 23: November 19 10-725/36-725: Conve Optimization Fall 2018 Lecturer: Ryan Tibshirani Lecture 23: November 19 Scribes: Charvi Rastogi, George Stoica, Shuo Li Charvi Rastogi: 23.1-23.4.2, George Stoica: 23.4.3-23.8, Shuo

More information

EE364a Review Session 5

EE364a Review Session 5 EE364a Review Session 5 EE364a Review announcements: homeworks 1 and 2 graded homework 4 solutions (check solution to additional problem 1) scpd phone-in office hours: tuesdays 6-7pm (650-723-1156) 1 Complementary

More information

Coordinate Descent and Ascent Methods

Coordinate Descent and Ascent Methods Coordinate Descent and Ascent Methods Julie Nutini Machine Learning Reading Group November 3 rd, 2015 1 / 22 Projected-Gradient Methods Motivation Rewrite non-smooth problem as smooth constrained problem:

More information

UC Berkeley Department of Electrical Engineering and Computer Science. EECS 227A Nonlinear and Convex Optimization. Solutions 6 Fall 2009

UC Berkeley Department of Electrical Engineering and Computer Science. EECS 227A Nonlinear and Convex Optimization. Solutions 6 Fall 2009 UC Berkele Department of Electrical Engineering and Computer Science EECS 227A Nonlinear and Convex Optimization Solutions 6 Fall 2009 Solution 6.1 (a) p = 1 (b) The Lagrangian is L(x,, λ) = e x + λx 2

More information

9. Dual decomposition and dual algorithms

9. Dual decomposition and dual algorithms EE 546, Univ of Washington, Spring 2016 9. Dual decomposition and dual algorithms dual gradient ascent example: network rate control dual decomposition and the proximal gradient method examples with simple

More information

Dual methods and ADMM. Barnabas Poczos & Ryan Tibshirani Convex Optimization /36-725

Dual methods and ADMM. Barnabas Poczos & Ryan Tibshirani Convex Optimization /36-725 Dual methods and ADMM Barnabas Poczos & Ryan Tibshirani Convex Optimization 10-725/36-725 1 Given f : R n R, the function is called its conjugate Recall conjugate functions f (y) = max x R n yt x f(x)

More information

Optimality, Duality, Complementarity for Constrained Optimization

Optimality, Duality, Complementarity for Constrained Optimization Optimality, Duality, Complementarity for Constrained Optimization Stephen Wright University of Wisconsin-Madison May 2014 Wright (UW-Madison) Optimality, Duality, Complementarity May 2014 1 / 41 Linear

More information

10 Numerical methods for constrained problems

10 Numerical methods for constrained problems 10 Numerical methods for constrained problems min s.t. f(x) h(x) = 0 (l), g(x) 0 (m), x X The algorithms can be roughly divided the following way: ˆ primal methods: find descent direction keeping inside

More information

Introduction to Nonlinear Stochastic Programming

Introduction to Nonlinear Stochastic Programming School of Mathematics T H E U N I V E R S I T Y O H F R G E D I N B U Introduction to Nonlinear Stochastic Programming Jacek Gondzio Email: J.Gondzio@ed.ac.uk URL: http://www.maths.ed.ac.uk/~gondzio SPS

More information

Lecture 15 Newton Method and Self-Concordance. October 23, 2008

Lecture 15 Newton Method and Self-Concordance. October 23, 2008 Newton Method and Self-Concordance October 23, 2008 Outline Lecture 15 Self-concordance Notion Self-concordant Functions Operations Preserving Self-concordance Properties of Self-concordant Functions Implications

More information

A : k n. Usually k > n otherwise easily the minimum is zero. Analytical solution:

A : k n. Usually k > n otherwise easily the minimum is zero. Analytical solution: 1-5: Least-squares I A : k n. Usually k > n otherwise easily the minimum is zero. Analytical solution: f (x) =(Ax b) T (Ax b) =x T A T Ax 2b T Ax + b T b f (x) = 2A T Ax 2A T b = 0 Chih-Jen Lin (National

More information

Two hours. To be provided by Examinations Office: Mathematical Formula Tables. THE UNIVERSITY OF MANCHESTER. xx xxxx 2017 xx:xx xx.

Two hours. To be provided by Examinations Office: Mathematical Formula Tables. THE UNIVERSITY OF MANCHESTER. xx xxxx 2017 xx:xx xx. Two hours To be provided by Examinations Office: Mathematical Formula Tables. THE UNIVERSITY OF MANCHESTER CONVEX OPTIMIZATION - SOLUTIONS xx xxxx 27 xx:xx xx.xx Answer THREE of the FOUR questions. If

More information

Convex Optimization Theory. Athena Scientific, Supplementary Chapter 6 on Convex Optimization Algorithms

Convex Optimization Theory. Athena Scientific, Supplementary Chapter 6 on Convex Optimization Algorithms Convex Optimization Theory Athena Scientific, 2009 by Dimitri P. Bertsekas Massachusetts Institute of Technology Supplementary Chapter 6 on Convex Optimization Algorithms This chapter aims to supplement

More information

A : k n. Usually k > n otherwise easily the minimum is zero. Analytical solution:

A : k n. Usually k > n otherwise easily the minimum is zero. Analytical solution: 1-5: Least-squares I A : k n. Usually k > n otherwise easily the minimum is zero. Analytical solution: f (x) =(Ax b) T (Ax b) =x T A T Ax 2b T Ax + b T b f (x) = 2A T Ax 2A T b = 0 Chih-Jen Lin (National

More information

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation Instructor: Moritz Hardt Email: hardt+ee227c@berkeley.edu Graduate Instructor: Max Simchowitz Email: msimchow+ee227c@berkeley.edu

More information