Optimization for Machine Learning
|
|
- Giles Cummings
- 6 years ago
- Views:
Transcription
1 Optimization for Machine Learning (Problems; Algorithms - A) SUVRIT SRA Massachusetts Institute of Technology PKU Summer School on Data Science (July 2017)
2 Course materials Some references: Introductory lectures on convex optimization Nesterov Convex optimization Boyd & Vandenberghe Nonlinear programming Bertsekas Convex Analysis Rockafellar Fundamentals of convex analysis Urruty, Lemaréchal Lectures on modern convex optimization Nemirovski Optimization for Machine Learning Sra, Nowozin, Wright Theory of Convex Optimization for Machine Learning Bubeck NIPS 2016 Optimization Tutorial Bach, Sra Some related courses: EE227A, Spring 2013, (Sra, UC Berkeley) , Spring 2014 (Sra, CMU) EE364a,b (Boyd, Stanford) EE236b,c (Vandenberghe, UCLA) Venues: NIPS, ICML, UAI, AISTATS, SIOPT, Math. Prog. Suvrit Sra Optimization for Machine Learning 2 / 36
3 Lecture Plan Introduction (3 lectures) Problems and algorithms (5 lectures) Non-convex optimization, perspectives (2 lectures) Suvrit Sra Optimization for Machine Learning 3 / 36
4 Constrained problems Suvrit Sra Optimization for Machine Learning 4 / 36
5 Optimality constrained For every x, y dom f, we have f (y) f (x) + f (x), y x. Suvrit Sra (suvrit@mit.edu) Optimization for Machine Learning 5 / 36
6 Optimality constrained For every x, y dom f, we have f (y) f (x) + f (x), y x. Thus, x is optimal if and only if f (x ), y x 0, for all y X. Suvrit Sra (suvrit@mit.edu) Optimization for Machine Learning 5 / 36
7 Optimality constrained For every x, y dom f, we have f (y) f (x) + f (x), y x. Thus, x is optimal if and only if f (x ), y x 0, for all y X. If X = R n, this reduces to f (x ) = 0 f(x) X x f(x ) x If f (x ) 0, it defines supporting hyperplane to X at x Suvrit Sra (suvrit@mit.edu) Optimization for Machine Learning 5 / 36
8 Optimality conditions constrained Proof: Suppose y X such that f (x ), y x < 0 Using mean-value theorem of calculus, ξ [0, 1] s.t. f (x + t(y x )) = f (x ) + f (x + ξt(y x )), t(y x ) (we applied MVT to g(t) := f (x + t(y x ))) For sufficiently small t, since f continuous, by assump on y, f (x + ξt(y x )), y x < 0 This in turn implies that f (x + t(y x )) < f (x ) Since X is convex, x + t(y x ) X is also feasible Contradiction to local optimality of x Suvrit Sra (suvrit@mit.edu) Optimization for Machine Learning 6 / 36
9 Example: projection operator P X (z) := argmin x z 2 x X (Assume X is closed and convex, then projection is unique) Let X be nonempty, closed and convex. Optimality condition: x = P X (y) iff x z, y x 0 for all y X Exercise: Prove that projection is nonexpansive, i.e., P X (x) P X (y) 2 x y 2 for all x, y R n. Suvrit Sra (suvrit@mit.edu) Optimization for Machine Learning 7 / 36
10 Feasible descent min f (x) s.t. x X f (x ), x x 0, x X. f(x) X x f(x ) x Suvrit Sra (suvrit@mit.edu) Optimization for Machine Learning 8 / 36
11 Feasible descent x k+1 = x k + α k d k Suvrit Sra (suvrit@mit.edu) Optimization for Machine Learning 9 / 36
12 Feasible descent x k+1 = x k + α k d k d k feasible direction, i.e., x k + α k d k X Suvrit Sra (suvrit@mit.edu) Optimization for Machine Learning 9 / 36
13 Feasible descent x k+1 = x k + α k d k d k feasible direction, i.e., x k + α k d k X d k must also be descent direction, i.e., f (x k ), d k < 0 Stepsize α k chosen to ensure feasibility and descent. Suvrit Sra (suvrit@mit.edu) Optimization for Machine Learning 9 / 36
14 Feasible descent x k+1 = x k + α k d k d k feasible direction, i.e., x k + α k d k X d k must also be descent direction, i.e., f (x k ), d k < 0 Stepsize α k chosen to ensure feasibility and descent. Since X is convex, all feasible directions are of the form d k = γ(z x k ), γ > 0, where z X is any feasible vector. Suvrit Sra (suvrit@mit.edu) Optimization for Machine Learning 9 / 36
15 Feasible descent x k+1 = x k + α k d k d k feasible direction, i.e., x k + α k d k X d k must also be descent direction, i.e., f (x k ), d k < 0 Stepsize α k chosen to ensure feasibility and descent. Since X is convex, all feasible directions are of the form d k = γ(z x k ), γ > 0, where z X is any feasible vector. x k+1 = x k + α k (z k x k ), α k (0, 1] Suvrit Sra (suvrit@mit.edu) Optimization for Machine Learning 9 / 36
16 Cone of feasible directions x Feasible directions at x X d Suvrit Sra (suvrit@mit.edu) Optimization for Machine Learning 10 / 36
17 Frank-Wolfe / conditional gradient method Optimality: f (x k ), z k x k 0 for all z k X Suvrit Sra (suvrit@mit.edu) Optimization for Machine Learning 11 / 36
18 Frank-Wolfe / conditional gradient method Optimality: f (x k ), z k x k 0 for all z k X Aim: If not optimal, then generate feasible direction d k = z k x k that obeys descent condition f (x k ), d k < 0. Suvrit Sra (suvrit@mit.edu) Optimization for Machine Learning 11 / 36
19 Frank-Wolfe / conditional gradient method Optimality: f (x k ), z k x k 0 for all z k X Aim: If not optimal, then generate feasible direction d k = z k x k that obeys descent condition f (x k ), d k < 0. Frank-Wolfe (Conditional gradient) method Let z k argmin x X f (x k ), x x k Use different methods to select α k x k+1 = x k + α k (z k x k ) Suvrit Sra (suvrit@mit.edu) Optimization for Machine Learning 11 / 36
20 Frank-Wolfe / conditional gradient method Optimality: f (x k ), z k x k 0 for all z k X Aim: If not optimal, then generate feasible direction d k = z k x k that obeys descent condition f (x k ), d k < 0. Frank-Wolfe (Conditional gradient) method Let z k argmin x X f (x k ), x x k Use different methods to select α k x k+1 = x k + α k (z k x k ) Due to M. Frank and P. Wolfe (1956) Practical when solving linear problem over X easy Very popular in machine learning over recent years Refinements, several variants (including nonconvex) Suvrit Sra (suvrit@mit.edu) Optimization for Machine Learning 11 / 36
21 Frank-Wolfe: Convergence Assum: There is a C 0 s.t. for all x, z X and α (0, 1): f ( (1 α)x + αz) f (x) + α f (x), z x Cα2. Suvrit Sra (suvrit@mit.edu) Optimization for Machine Learning 12 / 36
22 Frank-Wolfe: Convergence Assum: There is a C 0 s.t. for all x, z X and α (0, 1): f ( (1 α)x + αz) f (x) + α f (x), z x Cα2. Let α k = 2 k+2. Recall xk+1 = (1 α k )x k + α k z k ; thus, Suvrit Sra (suvrit@mit.edu) Optimization for Machine Learning 12 / 36
23 Frank-Wolfe: Convergence Assum: There is a C 0 s.t. for all x, z X and α (0, 1): f ( (1 α)x + αz) f (x) + α f (x), z x Cα2. Let α k = 2 k+2. Recall xk+1 = (1 α k )x k + α k z k ; thus, f (x k+1 ) f (x ) = f ((1 α k )x k + α k z k ) f (x ) Suvrit Sra (suvrit@mit.edu) Optimization for Machine Learning 12 / 36
24 Frank-Wolfe: Convergence Assum: There is a C 0 s.t. for all x, z X and α (0, 1): f ( (1 α)x + αz) f (x) + α f (x), z x Cα2. Let α k = 2 k+2. Recall xk+1 = (1 α k )x k + α k z k ; thus, f (x k+1 ) f (x ) = f ((1 α k )x k + α k z k ) f (x ) f (x k ) f (x ) + α k f (x k ), z k x k α2 k C Suvrit Sra (suvrit@mit.edu) Optimization for Machine Learning 12 / 36
25 Frank-Wolfe: Convergence Assum: There is a C 0 s.t. for all x, z X and α (0, 1): f ( (1 α)x + αz) f (x) + α f (x), z x Cα2. Let α k = 2 k+2. Recall xk+1 = (1 α k )x k + α k z k ; thus, f (x k+1 ) f (x ) = f ((1 α k )x k + α k z k ) f (x ) f (x k ) f (x ) + α k f (x k ), z k x k α2 k C f (x k ) f (x ) + α k f (x k ), x x k α2 k C Suvrit Sra (suvrit@mit.edu) Optimization for Machine Learning 12 / 36
26 Frank-Wolfe: Convergence Assum: There is a C 0 s.t. for all x, z X and α (0, 1): f ( (1 α)x + αz) f (x) + α f (x), z x Cα2. Let α k = 2 k+2. Recall xk+1 = (1 α k )x k + α k z k ; thus, f (x k+1 ) f (x ) = f ((1 α k )x k + α k z k ) f (x ) f (x k ) f (x ) + α k f (x k ), z k x k α2 k C f (x k ) f (x ) + α k f (x k ), x x k α2 k C f (x k ) f (x ) α k (f (x k ) f (x )) α2 k C Suvrit Sra (suvrit@mit.edu) Optimization for Machine Learning 12 / 36
27 Frank-Wolfe: Convergence Assum: There is a C 0 s.t. for all x, z X and α (0, 1): f ( (1 α)x + αz) f (x) + α f (x), z x Cα2. Let α k = 2 k+2. Recall xk+1 = (1 α k )x k + α k z k ; thus, f (x k+1 ) f (x ) = f ((1 α k )x k + α k z k ) f (x ) f (x k ) f (x ) + α k f (x k ), z k x k α2 k C f (x k ) f (x ) + α k f (x k ), x x k α2 k C f (x k ) f (x ) α k (f (x k ) f (x )) α2 k C = (1 α k )(f (x k ) f (x )) α2 k C. Suvrit Sra (suvrit@mit.edu) Optimization for Machine Learning 12 / 36
28 Frank-Wolfe: Convergence Assum: There is a C 0 s.t. for all x, z X and α (0, 1): f ( (1 α)x + αz) f (x) + α f (x), z x Cα2. Let α k = 2 k+2. Recall xk+1 = (1 α k )x k + α k z k ; thus, f (x k+1 ) f (x ) = f ((1 α k )x k + α k z k ) f (x ) f (x k ) f (x ) + α k f (x k ), z k x k α2 k C f (x k ) f (x ) + α k f (x k ), x x k α2 k C f (x k ) f (x ) α k (f (x k ) f (x )) α2 k C = (1 α k )(f (x k ) f (x )) α2 k C. A simple induction (Verify!) then shows that f (x k ) f (x ) 2C k + 2, k 0. Suvrit Sra (suvrit@mit.edu) Optimization for Machine Learning 12 / 36
29 Example: Linear Oracle Suppose X = { x p 1 }, for p > 1. Write Linear Oracle (LO) as maximization problem: max z g, z s.t. z p 1. What is the answer? Suvrit Sra (suvrit@mit.edu) Optimization for Machine Learning 13 / 36
30 Example: Linear Oracle Suppose X = { x p 1 }, for p > 1. Write Linear Oracle (LO) as maximization problem: max z g, z s.t. z p 1. What is the answer? Hint: Recall, g, z z p g q. Pick z to obtain equality. Suvrit Sra (suvrit@mit.edu) Optimization for Machine Learning 13 / 36
31 Example: Linear Oracle Suppose X = { x p 1 }, for p > 1. Write Linear Oracle (LO) as maximization problem: max z g, z s.t. z p 1. What is the answer? Hint: Recall, g, z z p g q. Pick z to obtain equality. Suvrit Sra (suvrit@mit.edu) Optimization for Machine Learning 13 / 36
32 Example: Linear Oracle max Z Trace norm LO G, Z σ i(z) 1. i Suvrit Sra (suvrit@mit.edu) Optimization for Machine Learning 14 / 36
33 Example: Linear Oracle max Z Trace norm LO G, Z σ i(z) 1. i max { G, Z Z 1} is just the dual-norm to the trace norm; see Lectures 1-3 for more on dual norms. Suvrit Sra (suvrit@mit.edu) Optimization for Machine Learning 14 / 36
34 Example: Linear Oracle max Z Trace norm LO G, Z σ i(z) 1. i max { G, Z Z 1} is just the dual-norm to the trace norm; see Lectures 1-3 for more on dual norms. can be shown that G 2 is the dual norm here. Suvrit Sra (suvrit@mit.edu) Optimization for Machine Learning 14 / 36
35 Example: Linear Oracle max Z Trace norm LO G, Z σ i(z) 1. i max { G, Z Z 1} is just the dual-norm to the trace norm; see Lectures 1-3 for more on dual norms. can be shown that G 2 is the dual norm here. Optimal Z satisfies G, Z = G 2 Z = G 2 ; use Lanczos (or using power method) to compute top singular vectors. (for more examples: Jaggi, Revisiting Frank-Wolfe:...) Suvrit Sra (suvrit@mit.edu) Optimization for Machine Learning 14 / 36
36 Extensions How about FW for nonconvex objective functions? What about FW methods that can converge faster than O(1/k)? Suvrit Sra Optimization for Machine Learning 15 / 36
37 Extensions How about FW for nonconvex objective functions? What about FW methods that can converge faster than O(1/k)? Nonconvex-FW possible. It works (i.e., satisfies first-order optimality conditions to ɛ-accuracy in O(1/ɛ) iterations (Lacoste-Julien 2016; Reddi et al. 2016). Suvrit Sra Optimization for Machine Learning 15 / 36
38 Extensions How about FW for nonconvex objective functions? What about FW methods that can converge faster than O(1/k)? Nonconvex-FW possible. It works (i.e., satisfies first-order optimality conditions to ɛ-accuracy in O(1/ɛ) iterations (Lacoste-Julien 2016; Reddi et al. 2016). Linear convergence under quite strong assumptions on both f and X ; alternatively, use a more complicated method: FW with Away Steps (Guelat-Marcotte 1986); more recently (Jaggi, Lacoste-Julien 2016) Suvrit Sra (suvrit@mit.edu) Optimization for Machine Learning 15 / 36
39 Quadratic oracle: projection methods FW can be quite slow If X not compact, LO does not make sense A possible alternative (with other weaknesses though!) Suvrit Sra (suvrit@mit.edu) Optimization for Machine Learning 16 / 36
40 Quadratic oracle: projection methods FW can be quite slow If X not compact, LO does not make sense A possible alternative (with other weaknesses though!) If constraint set X is simple, i.e., we can easily solve projections min 1 2 x y 2 s.t. x X. Suvrit Sra (suvrit@mit.edu) Optimization for Machine Learning 16 / 36
41 Quadratic oracle: projection methods FW can be quite slow If X not compact, LO does not make sense A possible alternative (with other weaknesses though!) If constraint set X is simple, i.e., we can easily solve projections min 1 2 x y 2 s.t. x X. Projected Gradient x k+1 = P X ( x k α k f (x k ) ), k = 0, 1,... where P X denotes above orthogonal projection. Suvrit Sra (suvrit@mit.edu) Optimization for Machine Learning 16 / 36
42 Quadratic oracle: projection methods FW can be quite slow If X not compact, LO does not make sense A possible alternative (with other weaknesses though!) If constraint set X is simple, i.e., we can easily solve projections min 1 2 x y 2 s.t. x X. Projected Gradient x k+1 = P X ( x k α k f (x k ) ), k = 0, 1,... where P X denotes above orthogonal projection. PG can be much faster than O(1/k) of FW (e.g., O(e k ) for strongly convex); but LO can be sometimes much faster than projections. Suvrit Sra (suvrit@mit.edu) Optimization for Machine Learning 16 / 36
43 Projected Gradient convergence Depends on the following crucial properties of P: Nonexpansivity: Px Py 2 x y 2 Firm nonxpansivity: Px Py 2 2 Px Py, x y Suvrit Sra (suvrit@mit.edu) Optimization for Machine Learning 17 / 36
44 Projected Gradient convergence Depends on the following crucial properties of P: Nonexpansivity: Px Py 2 x y 2 Firm nonxpansivity: Px Py 2 2 Px Py, x y Using projections, essentially convergence analysis with α k = 1/L for the unconstrained case works. Exercise: Let f (x) = 1 2 Ax b 2 2. Write a Matlab/Python script to minimize this function over the convex set X := { 1 x i 1} using projected gradient as well as Frank-Wolfe. Compare the two. Suvrit Sra (suvrit@mit.edu) Optimization for Machine Learning 17 / 36
45 Duality Suvrit Sra Optimization for Machine Learning 18 / 36
46 Primal problem Let f i : R n R (0 i m). Generic nonlinear program min f 0 (x) s.t. f i (x) 0, 1 i m, x {dom f 0 dom f 1 dom f m }. (P) Def. Domain: The set D := {dom f 0 dom f 1 dom f m } We call (P) the primal problem The variable x is the primal variable We will attach to (P) a dual problem In our initial derivation: no restriction to convexity. Suvrit Sra (suvrit@mit.edu) Optimization for Machine Learning 19 / 36
47 Lagrangian To the primal problem, associate Lagrangian L : R n R m R, L(x, λ) := f 0 (x) + m λ if i (x). i=1 Variables λ R m called Lagrange multipliers Suvrit Sra (suvrit@mit.edu) Optimization for Machine Learning 20 / 36
48 Lagrangian To the primal problem, associate Lagrangian L : R n R m R, L(x, λ) := f 0 (x) + m λ if i (x). i=1 Variables λ R m called Lagrange multipliers If x is feasible, λ 0, then we get the lower-bound f 0 (x) L(x, λ) x X, λ R m +. Suvrit Sra (suvrit@mit.edu) Optimization for Machine Learning 20 / 36
49 Lagrangian To the primal problem, associate Lagrangian L : R n R m R, L(x, λ) := f 0 (x) + m λ if i (x). i=1 Variables λ R m called Lagrange multipliers If x is feasible, λ 0, then we get the lower-bound f 0 (x) L(x, λ) x X, λ R m +. Lagrangian helps write problem in unconstrained form Suvrit Sra (suvrit@mit.edu) Optimization for Machine Learning 20 / 36
50 Lagrangian Claim: Since, f 0 (x) L(x, λ) x X, λ R m +, primal optimal p = inf x X sup λ 0 L(x, λ). Suvrit Sra (suvrit@mit.edu) Optimization for Machine Learning 21 / 36
51 Lagrangian Claim: Since, f 0 (x) L(x, λ) x X, λ R m +, primal optimal p = inf x X sup λ 0 L(x, λ). Proof: If x is not feasible, then some f i (x) > 0 Suvrit Sra (suvrit@mit.edu) Optimization for Machine Learning 21 / 36
52 Lagrangian Claim: Since, f 0 (x) L(x, λ) x X, λ R m +, primal optimal p = inf x X sup λ 0 L(x, λ). Proof: If x is not feasible, then some f i (x) > 0 In this case, inner sup is +, so claim true by definition Suvrit Sra (suvrit@mit.edu) Optimization for Machine Learning 21 / 36
53 Lagrangian Claim: Since, f 0 (x) L(x, λ) x X, λ R m +, primal optimal p = inf x X sup λ 0 L(x, λ). Proof: If x is not feasible, then some f i (x) > 0 In this case, inner sup is +, so claim true by definition If x is feasible, each f i (x) 0, so sup λ i λ if i (x) = 0 Suvrit Sra (suvrit@mit.edu) Optimization for Machine Learning 21 / 36
54 Lagrange dual function Def. We define the Lagrangian dual as g(λ) := inf x L(x, λ). Suvrit Sra (suvrit@mit.edu) Optimization for Machine Learning 22 / 36
55 Lagrange dual function Def. We define the Lagrangian dual as g(λ) := inf x L(x, λ). Observations: g is pointwise inf of affine functions of λ Thus, g is concave; it may take value Suvrit Sra (suvrit@mit.edu) Optimization for Machine Learning 22 / 36
56 Lagrange dual function Def. We define the Lagrangian dual as g(λ) := inf x L(x, λ). Observations: g is pointwise inf of affine functions of λ Thus, g is concave; it may take value Recall: f 0 (x) L(x, λ) x X, λ 0; thus x X, f 0 (x) inf x L(x, λ) =: g(λ) Now minimize over x on lhs, to obtain λ R m + p g(λ). Suvrit Sra (suvrit@mit.edu) Optimization for Machine Learning 22 / 36
57 Lagrange dual problem sup g(λ) s.t. λ 0. λ Suvrit Sra Optimization for Machine Learning 23 / 36
58 Lagrange dual problem sup g(λ) s.t. λ 0. λ dual feasible: if λ 0 and g(λ) > dual optimal: λ if sup is achieved Lagrange dual is always concave, regardless of original Suvrit Sra (suvrit@mit.edu) Optimization for Machine Learning 23 / 36
59 Weak duality Def. Denote dual optimal value by d, i.e., d := sup λ 0 g(λ). Suvrit Sra (suvrit@mit.edu) Optimization for Machine Learning 24 / 36
60 Weak duality Def. Denote dual optimal value by d, i.e., d := sup λ 0 g(λ). Theorem. (Weak-duality): For problem (P), we have p d. Suvrit Sra (suvrit@mit.edu) Optimization for Machine Learning 24 / 36
61 Weak duality Def. Denote dual optimal value by d, i.e., d := sup λ 0 g(λ). Theorem. (Weak-duality): For problem (P), we have p d. Proof: We showed that for all λ R m +, p g(λ). Thus, it follows that p sup g(λ) = d. Suvrit Sra (suvrit@mit.edu) Optimization for Machine Learning 24 / 36
62 Duality gap p d 0 Suvrit Sra (suvrit@mit.edu) Optimization for Machine Learning 25 / 36
63 Duality gap p d 0 Strong duality if duality gap is zero: p = d Notice: both p and d may be + Suvrit Sra (suvrit@mit.edu) Optimization for Machine Learning 25 / 36
64 Duality gap p d 0 Strong duality if duality gap is zero: p = d Notice: both p and d may be + Several sufficient conditions known, especially for convex optimization. Easy necessary and sufficient conditions: unknown Suvrit Sra (suvrit@mit.edu) Optimization for Machine Learning 25 / 36
65 Example: Slater s sufficient conditions min f 0 (x) s.t. f i (x) 0, 1 i m, Ax = b. Suvrit Sra (suvrit@mit.edu) Optimization for Machine Learning 26 / 36
66 Example: Slater s sufficient conditions min f 0 (x) s.t. f i (x) 0, 1 i m, Ax = b. Constraint qualification: There exists x ri D s.t. f i (x) < 0, Ax = b. That is, there is a strictly feasible point. Theorem. Let the primal problem be convex. If there is a feasible point such that is strictly feasible for the non-affine constraints (and merely feasible for affine, linear ones), then strong duality holds. Moreover, the dual optimal is attained (i.e., d > ). Reading: Read BV for a proof. Suvrit Sra (suvrit@mit.edu) Optimization for Machine Learning 26 / 36
67 Example: failure of strong duality min x,y e x x 2 /y 0, over the domain D = {(x, y) y > 0}. Suvrit Sra (suvrit@mit.edu) Optimization for Machine Learning 27 / 36
68 Example: failure of strong duality min x,y e x x 2 /y 0, over the domain D = {(x, y) y > 0}. Clearly, only feasible x = 0. So p = 1 Suvrit Sra (suvrit@mit.edu) Optimization for Machine Learning 27 / 36
69 Example: failure of strong duality min x,y e x x 2 /y 0, over the domain D = {(x, y) y > 0}. Clearly, only feasible x = 0. So p = 1 L(x, y, λ) = e x + λx 2 /y, so dual function is { g(λ) = inf x,y>0 e x + λx 2 0 λ 0 y = λ < 0. Suvrit Sra (suvrit@mit.edu) Optimization for Machine Learning 27 / 36
70 Example: failure of strong duality min x,y e x x 2 /y 0, over the domain D = {(x, y) y > 0}. Clearly, only feasible x = 0. So p = 1 L(x, y, λ) = e x + λx 2 /y, so dual function is { g(λ) = inf x,y>0 e x + λx 2 0 λ 0 y = λ < 0. Dual problem d = max 0 s.t. λ 0. λ Thus, d = 0, and gap is p d = 1. Here, we had no strictly feasible solution. Suvrit Sra (suvrit@mit.edu) Optimization for Machine Learning 27 / 36
71 Zero duality gap: nonconvex example Trust region subproblem (TRS) min x T Ax + 2b T x x T x 1. A is symmetric but not necessarily semidefinite! Theorem. TRS always has zero duality gap. Remark: Above theorem extremely important; part of family of related results for certain quadratic nonconvex problems. Suvrit Sra (suvrit@mit.edu) Optimization for Machine Learning 28 / 36
72 Example: Maxent min i x i log x i Ax b, 1 T x = 1, x > 0. Suvrit Sra (suvrit@mit.edu) Optimization for Machine Learning 29 / 36
73 Example: Maxent min i x i log x i Ax b, 1 T x = 1, x > 0. Convex conjugate of f (x) = x log x is f (y) = e y 1. Suvrit Sra (suvrit@mit.edu) Optimization for Machine Learning 29 / 36
74 Example: Maxent min i x i log x i Ax b, 1 T x = 1, x > 0. Convex conjugate of f (x) = x log x is f (y) = e y 1. max λ,ν g(λ, ν) = b T λ v n s.t. λ 0. i=1 e (AT λ) i ν 1 Suvrit Sra (suvrit@mit.edu) Optimization for Machine Learning 29 / 36
75 Example: Maxent min i x i log x i Ax b, 1 T x = 1, x > 0. Convex conjugate of f (x) = x log x is f (y) = e y 1. max λ,ν g(λ, ν) = b T λ v n s.t. λ 0. i=1 e (AT λ) i ν 1 If there is x > 0 with Ax b and 1 T x = 1, strong duality holds. Suvrit Sra (suvrit@mit.edu) Optimization for Machine Learning 29 / 36
76 Example: Maxent min i x i log x i Ax b, 1 T x = 1, x > 0. Convex conjugate of f (x) = x log x is f (y) = e y 1. max λ,ν g(λ, ν) = b T λ v n s.t. λ 0. i=1 e (AT λ) i ν 1 If there is x > 0 with Ax b and 1 T x = 1, strong duality holds. Exercise: Simplify above dual by optimizing out ν Suvrit Sra (suvrit@mit.edu) Optimization for Machine Learning 29 / 36
77 Example: dual for Support Vector Machine min x,ξ 1 2 x C i ξ i s.t. Ax 1 ξ, ξ 0. Suvrit Sra (suvrit@mit.edu) Optimization for Machine Learning 30 / 36
78 Example: dual for Support Vector Machine min x,ξ 1 2 x C i ξ i s.t. Ax 1 ξ, ξ 0. L(x, ξ, λ, ν) = 1 2 x C1T ξ λ T (Ax 1 + ξ) ν T ξ Suvrit Sra (suvrit@mit.edu) Optimization for Machine Learning 30 / 36
79 Example: dual for Support Vector Machine min x,ξ 1 2 x C i ξ i s.t. Ax 1 ξ, ξ 0. L(x, ξ, λ, ν) = 1 2 x C1T ξ λ T (Ax 1 + ξ) ν T ξ g(λ, ν) := L(x, ξ, λ, ν) inf { = λ T AT λ 2 2 λ + ν = C1 + otherwise d = max λ 0,ν 0 g(λ, ν) Exercise: Using ν 0, eliminate ν from above problem. Suvrit Sra (suvrit@mit.edu) Optimization for Machine Learning 30 / 36
80 Dual via Fenchel conjugates min f (x) s.t. f i (x) 0, Ax = b. L(x, λ, ν) := f 0 (x) + i λ if i (x) + ν T (Ax b) Suvrit Sra (suvrit@mit.edu) Optimization for Machine Learning 31 / 36
81 Dual via Fenchel conjugates min f (x) s.t. f i (x) 0, Ax = b. L(x, λ, ν) := f 0 (x) + i λ if i (x) + ν T (Ax b) g(λ, ν) = inf L(x, λ, ν) x Suvrit Sra (suvrit@mit.edu) Optimization for Machine Learning 31 / 36
82 Dual via Fenchel conjugates min f (x) s.t. f i (x) 0, Ax = b. L(x, λ, ν) := f 0 (x) + i λ if i (x) + ν T (Ax b) g(λ, ν) = inf L(x, λ, ν) x g(λ, ν) = ν T b + inf x xt A T ν + F(x) F(x) := f 0 (x) + i λ if i (x) Suvrit Sra (suvrit@mit.edu) Optimization for Machine Learning 31 / 36
83 Dual via Fenchel conjugates min f (x) s.t. f i (x) 0, Ax = b. L(x, λ, ν) := f 0 (x) + i λ if i (x) + ν T (Ax b) g(λ, ν) = inf L(x, λ, ν) x g(λ, ν) = ν T b + inf x xt A T ν + F(x) F(x) := f 0 (x) + i λ if i (x) g(λ, ν) = ν T b sup x x, A T ν F(x) Suvrit Sra (suvrit@mit.edu) Optimization for Machine Learning 31 / 36
84 Dual via Fenchel conjugates min f (x) s.t. f i (x) 0, Ax = b. L(x, λ, ν) := f 0 (x) + i λ if i (x) + ν T (Ax b) g(λ, ν) = inf L(x, λ, ν) x g(λ, ν) = ν T b + inf x xt A T ν + F(x) F(x) := f 0 (x) + i λ if i (x) g(λ, ν) = ν T b sup x g(λ, ν) = ν T b F ( A T ν). x, A T ν F(x) Not so useful! F hard to compute. Suvrit Sra (suvrit@mit.edu) Optimization for Machine Learning 31 / 36
85 Example: norm regularized problems min f (x) + Ax Suvrit Sra (suvrit@mit.edu) Optimization for Machine Learning 32 / 36
86 Example: norm regularized problems min f (x) + Ax Dual problem min y f ( A T y) s.t. y 1. Suvrit Sra (suvrit@mit.edu) Optimization for Machine Learning 32 / 36
87 Example: norm regularized problems min f (x) + Ax Dual problem min y f ( A T y) s.t. y 1. Say ȳ < 1, such that A T ȳ ri(dom f ), then we have strong duality (e.g., for instance 0 ri(dom f )) Suvrit Sra (suvrit@mit.edu) Optimization for Machine Learning 32 / 36
88 Example: Lasso-like problem p := min x Ax b 2 + λ x 1. Suvrit Sra (suvrit@mit.edu) Optimization for Machine Learning 33 / 36
89 Example: Lasso-like problem p := min x Ax b 2 + λ x 1. { } x 1 = max x T v v 1 { } x 2 = max x T u u 2 1. Suvrit Sra (suvrit@mit.edu) Optimization for Machine Learning 33 / 36
90 Example: Lasso-like problem p = min x p := min x Ax b 2 + λ x 1. { } x 1 = max x T v v 1 { } x 2 = max x T u u 2 1. max u,v Saddle-point formulation { u T (b Ax) + v T x u 2 1, v λ } Suvrit Sra (suvrit@mit.edu) Optimization for Machine Learning 33 / 36
91 Example: Lasso-like problem p = min x = max u,v p := min x Ax b 2 + λ x 1. { } x 1 = max x T v v 1 { } x 2 = max x T u u 2 1. max u,v min x Saddle-point formulation { } u T (b Ax) + v T x u 2 1, v λ { } u T (b Ax) + x T v u 2 1, v λ Suvrit Sra (suvrit@mit.edu) Optimization for Machine Learning 33 / 36
92 Example: Lasso-like problem p = min x = max u,v p := min x Ax b 2 + λ x 1. { } x 1 = max x T v v 1 { } x 2 = max x T u u 2 1. max u,v min x Saddle-point formulation { } u T (b Ax) + v T x u 2 1, v λ { } u T (b Ax) + x T v u 2 1, v λ = max u,v ut b A T u = v, u 2 1, v λ Suvrit Sra (suvrit@mit.edu) Optimization for Machine Learning 33 / 36
93 Example: Lasso-like problem p = min x = max u,v p := min x Ax b 2 + λ x 1. { } x 1 = max x T v v 1 { } x 2 = max x T u u 2 1. max u,v min x Saddle-point formulation { } u T (b Ax) + v T x u 2 1, v λ { } u T (b Ax) + x T v u 2 1, v λ = max u,v ut b A T u = v, u 2 1, v λ = max u T b u 2 1, A T v λ. u Suvrit Sra (suvrit@mit.edu) Optimization for Machine Learning 33 / 36
94 Example: KKT conditions min f 0 (x) f i (x) 0, i = 1,..., m. Suvrit Sra (suvrit@mit.edu) Optimization for Machine Learning 34 / 36
95 Example: KKT conditions min f 0 (x) f i (x) 0, i = 1,..., m. Recall: f 0 (x ), x x 0 for all feasible x X Suvrit Sra (suvrit@mit.edu) Optimization for Machine Learning 34 / 36
96 Example: KKT conditions min f 0 (x) f i (x) 0, i = 1,..., m. Recall: f 0 (x ), x x 0 for all feasible x X Can we simplify this using Lagrangian? g(λ) = inf x L(x, λ) := f 0 (x) + i λ if i (x) Suvrit Sra (suvrit@mit.edu) Optimization for Machine Learning 34 / 36
97 Example: KKT conditions min f 0 (x) f i (x) 0, i = 1,..., m. Recall: f 0 (x ), x x 0 for all feasible x X Can we simplify this using Lagrangian? g(λ) = inf x L(x, λ) := f 0 (x) + i λ if i (x) Assume strong duality; and both p and d attained! Suvrit Sra (suvrit@mit.edu) Optimization for Machine Learning 34 / 36
98 Example: KKT conditions min f 0 (x) f i (x) 0, i = 1,..., m. Recall: f 0 (x ), x x 0 for all feasible x X Can we simplify this using Lagrangian? g(λ) = inf x L(x, λ) := f 0 (x) + i λ if i (x) Assume strong duality; and both p and d attained! Thus, there exists a pair (x, λ ) such that p = f 0 (x ) Suvrit Sra (suvrit@mit.edu) Optimization for Machine Learning 34 / 36
99 Example: KKT conditions min f 0 (x) f i (x) 0, i = 1,..., m. Recall: f 0 (x ), x x 0 for all feasible x X Can we simplify this using Lagrangian? g(λ) = inf x L(x, λ) := f 0 (x) + i λ if i (x) Assume strong duality; and both p and d attained! Thus, there exists a pair (x, λ ) such that p = f 0 (x ) = d = g(λ ) Suvrit Sra (suvrit@mit.edu) Optimization for Machine Learning 34 / 36
100 Example: KKT conditions min f 0 (x) f i (x) 0, i = 1,..., m. Recall: f 0 (x ), x x 0 for all feasible x X Can we simplify this using Lagrangian? g(λ) = inf x L(x, λ) := f 0 (x) + i λ if i (x) Assume strong duality; and both p and d attained! Thus, there exists a pair (x, λ ) such that p = f 0 (x ) = d = g(λ ) = min x L(x, λ ) Suvrit Sra (suvrit@mit.edu) Optimization for Machine Learning 34 / 36
101 Example: KKT conditions min f 0 (x) f i (x) 0, i = 1,..., m. Recall: f 0 (x ), x x 0 for all feasible x X Can we simplify this using Lagrangian? g(λ) = inf x L(x, λ) := f 0 (x) + i λ if i (x) Assume strong duality; and both p and d attained! Thus, there exists a pair (x, λ ) such that p = f 0 (x ) = d = g(λ ) = min x L(x, λ ) L(x, λ ) Suvrit Sra (suvrit@mit.edu) Optimization for Machine Learning 34 / 36
102 Example: KKT conditions min f 0 (x) f i (x) 0, i = 1,..., m. Recall: f 0 (x ), x x 0 for all feasible x X Can we simplify this using Lagrangian? g(λ) = inf x L(x, λ) := f 0 (x) + i λ if i (x) Assume strong duality; and both p and d attained! Thus, there exists a pair (x, λ ) such that p = f 0 (x ) = d = g(λ ) = min L(x, λ ) L(x, λ ) f 0 (x ) = p x Suvrit Sra (suvrit@mit.edu) Optimization for Machine Learning 34 / 36
103 Example: KKT conditions min f 0 (x) f i (x) 0, i = 1,..., m. Recall: f 0 (x ), x x 0 for all feasible x X Can we simplify this using Lagrangian? g(λ) = inf x L(x, λ) := f 0 (x) + i λ if i (x) Assume strong duality; and both p and d attained! Thus, there exists a pair (x, λ ) such that p = f 0 (x ) = d = g(λ ) = min L(x, λ ) L(x, λ ) f 0 (x ) = p x Thus, equalities hold in above chain. Suvrit Sra (suvrit@mit.edu) Optimization for Machine Learning 34 / 36
104 Example: KKT conditions min f 0 (x) f i (x) 0, i = 1,..., m. Recall: f 0 (x ), x x 0 for all feasible x X Can we simplify this using Lagrangian? g(λ) = inf x L(x, λ) := f 0 (x) + i λ if i (x) Assume strong duality; and both p and d attained! Thus, there exists a pair (x, λ ) such that p = f 0 (x ) = d = g(λ ) = min L(x, λ ) L(x, λ ) f 0 (x ) = p x Thus, equalities hold in above chain. x argmin x L(x, λ ). Suvrit Sra (suvrit@mit.edu) Optimization for Machine Learning 34 / 36
105 Example: KKT conditions x argmin x L(x, λ ). If f 0, f 1,..., f m are differentiable, this implies Suvrit Sra (suvrit@mit.edu) Optimization for Machine Learning 35 / 36
106 Example: KKT conditions x argmin x L(x, λ ). If f 0, f 1,..., f m are differentiable, this implies x L(x, λ ) x=x = f 0 (x ) + i λ i f i(x ) = 0. Suvrit Sra (suvrit@mit.edu) Optimization for Machine Learning 35 / 36
107 Example: KKT conditions x argmin x L(x, λ ). If f 0, f 1,..., f m are differentiable, this implies x L(x, λ ) x=x = f 0 (x ) + i λ i f i(x ) = 0. Moreover, since L(x, λ ) = f 0 (x ), we also have Suvrit Sra (suvrit@mit.edu) Optimization for Machine Learning 35 / 36
108 Example: KKT conditions x argmin x L(x, λ ). If f 0, f 1,..., f m are differentiable, this implies x L(x, λ ) x=x = f 0 (x ) + i λ i f i(x ) = 0. Moreover, since L(x, λ ) = f 0 (x ), we also have i λ i f i(x ) = 0. Suvrit Sra (suvrit@mit.edu) Optimization for Machine Learning 35 / 36
109 Example: KKT conditions x argmin x L(x, λ ). If f 0, f 1,..., f m are differentiable, this implies x L(x, λ ) x=x = f 0 (x ) + i λ i f i(x ) = 0. Moreover, since L(x, λ ) = f 0 (x ), we also have i λ i f i(x ) = 0. But λ i 0 and f i (x ) 0, Suvrit Sra (suvrit@mit.edu) Optimization for Machine Learning 35 / 36
110 Example: KKT conditions x argmin x L(x, λ ). If f 0, f 1,..., f m are differentiable, this implies x L(x, λ ) x=x = f 0 (x ) + i λ i f i(x ) = 0. Moreover, since L(x, λ ) = f 0 (x ), we also have i λ i f i(x ) = 0. But λ i 0 and f i (x ) 0, so complementary slackness λ i f i(x ) = 0, i = 1,..., m. Suvrit Sra (suvrit@mit.edu) Optimization for Machine Learning 35 / 36
111 KKT conditions f i (x ) 0, i = 1,..., m (primal feasibility) λ i 0, i = 1,..., m (dual feasibility) λ i f i(x ) = 0, i = 1,..., m (compl. slackness) x L(x, λ ) x=x = 0 (Lagrangian stationarity) Suvrit Sra (suvrit@mit.edu) Optimization for Machine Learning 36 / 36
112 KKT conditions f i (x ) 0, i = 1,..., m (primal feasibility) λ i 0, i = 1,..., m (dual feasibility) λ i f i(x ) = 0, i = 1,..., m (compl. slackness) x L(x, λ ) x=x = 0 (Lagrangian stationarity) We showed: if strong duality holds, and (x, λ ) exist, then KKT conditions are necessary for pair (x, λ ) to be optimal Suvrit Sra (suvrit@mit.edu) Optimization for Machine Learning 36 / 36
113 KKT conditions f i (x ) 0, i = 1,..., m (primal feasibility) λ i 0, i = 1,..., m (dual feasibility) λ i f i(x ) = 0, i = 1,..., m (compl. slackness) x L(x, λ ) x=x = 0 (Lagrangian stationarity) We showed: if strong duality holds, and (x, λ ) exist, then KKT conditions are necessary for pair (x, λ ) to be optimal If problem is convex, then KKT also sufficient Suvrit Sra (suvrit@mit.edu) Optimization for Machine Learning 36 / 36
114 KKT conditions f i (x ) 0, i = 1,..., m (primal feasibility) λ i 0, i = 1,..., m (dual feasibility) λ i f i(x ) = 0, i = 1,..., m (compl. slackness) x L(x, λ ) x=x = 0 (Lagrangian stationarity) We showed: if strong duality holds, and (x, λ ) exist, then KKT conditions are necessary for pair (x, λ ) to be optimal If problem is convex, then KKT also sufficient Exercise: Prove the above sufficiency of KKT. Hint: Use that L(x, λ ) is convex, and conclude from KKT conditions that g(λ ) = f 0 (x ), so that (x, λ ) optimal primal-dual pair. Suvrit Sra (suvrit@mit.edu) Optimization for Machine Learning 36 / 36
Convex Optimization & Lagrange Duality
Convex Optimization & Lagrange Duality Chee Wei Tan CS 8292 : Advanced Topics in Convex Optimization and its Applications Fall 2010 Outline Convex optimization Optimality condition Lagrange duality KKT
More information5. Duality. Lagrangian
5. Duality Convex Optimization Boyd & Vandenberghe Lagrange dual problem weak and strong duality geometric interpretation optimality conditions perturbation and sensitivity analysis examples generalized
More informationConvex Optimization Boyd & Vandenberghe. 5. Duality
5. Duality Convex Optimization Boyd & Vandenberghe Lagrange dual problem weak and strong duality geometric interpretation optimality conditions perturbation and sensitivity analysis examples generalized
More informationLagrange duality. The Lagrangian. We consider an optimization program of the form
Lagrange duality Another way to arrive at the KKT conditions, and one which gives us some insight on solving constrained optimization problems, is through the Lagrange dual. The dual is a maximization
More informationConvex Optimization. Dani Yogatama. School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA. February 12, 2014
Convex Optimization Dani Yogatama School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA February 12, 2014 Dani Yogatama (Carnegie Mellon University) Convex Optimization February 12,
More informationOptimization for Machine Learning
Optimization for Machine Learning (Lecture 3-A - Convex) SUVRIT SRA Massachusetts Institute of Technology Special thanks: Francis Bach (INRIA, ENS) (for sharing this material, and permitting its use) MPI-IS
More informationConvex Optimization M2
Convex Optimization M2 Lecture 3 A. d Aspremont. Convex Optimization M2. 1/49 Duality A. d Aspremont. Convex Optimization M2. 2/49 DMs DM par email: dm.daspremont@gmail.com A. d Aspremont. Convex Optimization
More informationLecture: Duality of LP, SOCP and SDP
1/33 Lecture: Duality of LP, SOCP and SDP Zaiwen Wen Beijing International Center For Mathematical Research Peking University http://bicmr.pku.edu.cn/~wenzw/bigdata2017.html wenzw@pku.edu.cn Acknowledgement:
More informationLecture: Duality.
Lecture: Duality http://bicmr.pku.edu.cn/~wenzw/opt-2016-fall.html Acknowledgement: this slides is based on Prof. Lieven Vandenberghe s lecture notes Introduction 2/35 Lagrange dual problem weak and strong
More information14. Duality. ˆ Upper and lower bounds. ˆ General duality. ˆ Constraint qualifications. ˆ Counterexample. ˆ Complementary slackness.
CS/ECE/ISyE 524 Introduction to Optimization Spring 2016 17 14. Duality ˆ Upper and lower bounds ˆ General duality ˆ Constraint qualifications ˆ Counterexample ˆ Complementary slackness ˆ Examples ˆ Sensitivity
More informationConstrained Optimization and Lagrangian Duality
CIS 520: Machine Learning Oct 02, 2017 Constrained Optimization and Lagrangian Duality Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture. They may or may
More informationExtreme Abridgment of Boyd and Vandenberghe s Convex Optimization
Extreme Abridgment of Boyd and Vandenberghe s Convex Optimization Compiled by David Rosenberg Abstract Boyd and Vandenberghe s Convex Optimization book is very well-written and a pleasure to read. The
More informationEE/AA 578, Univ of Washington, Fall Duality
7. Duality EE/AA 578, Univ of Washington, Fall 2016 Lagrange dual problem weak and strong duality geometric interpretation optimality conditions perturbation and sensitivity analysis examples generalized
More informationLagrangian Duality and Convex Optimization
Lagrangian Duality and Convex Optimization David Rosenberg New York University February 11, 2015 David Rosenberg (New York University) DS-GA 1003 February 11, 2015 1 / 24 Introduction Why Convex Optimization?
More informationLecture 6: Conic Optimization September 8
IE 598: Big Data Optimization Fall 2016 Lecture 6: Conic Optimization September 8 Lecturer: Niao He Scriber: Juan Xu Overview In this lecture, we finish up our previous discussion on optimality conditions
More informationDuality. Lagrange dual problem weak and strong duality optimality conditions perturbation and sensitivity analysis generalized inequalities
Duality Lagrange dual problem weak and strong duality optimality conditions perturbation and sensitivity analysis generalized inequalities Lagrangian Consider the optimization problem in standard form
More informationDuality Uses and Correspondences. Ryan Tibshirani Convex Optimization
Duality Uses and Correspondences Ryan Tibshirani Conve Optimization 10-725 Recall that for the problem Last time: KKT conditions subject to f() h i () 0, i = 1,... m l j () = 0, j = 1,... r the KKT conditions
More informationI.3. LMI DUALITY. Didier HENRION EECI Graduate School on Control Supélec - Spring 2010
I.3. LMI DUALITY Didier HENRION henrion@laas.fr EECI Graduate School on Control Supélec - Spring 2010 Primal and dual For primal problem p = inf x g 0 (x) s.t. g i (x) 0 define Lagrangian L(x, z) = g 0
More informationMotivation. Lecture 2 Topics from Optimization and Duality. network utility maximization (NUM) problem:
CDS270 Maryam Fazel Lecture 2 Topics from Optimization and Duality Motivation network utility maximization (NUM) problem: consider a network with S sources (users), each sending one flow at rate x s, through
More informationKarush-Kuhn-Tucker Conditions. Lecturer: Ryan Tibshirani Convex Optimization /36-725
Karush-Kuhn-Tucker Conditions Lecturer: Ryan Tibshirani Convex Optimization 10-725/36-725 1 Given a minimization problem Last time: duality min x subject to f(x) h i (x) 0, i = 1,... m l j (x) = 0, j =
More informationSubgradient. Acknowledgement: this slides is based on Prof. Lieven Vandenberghes lecture notes. definition. subgradient calculus
1/41 Subgradient Acknowledgement: this slides is based on Prof. Lieven Vandenberghes lecture notes definition subgradient calculus duality and optimality conditions directional derivative Basic inequality
More information4TE3/6TE3. Algorithms for. Continuous Optimization
4TE3/6TE3 Algorithms for Continuous Optimization (Duality in Nonlinear Optimization ) Tamás TERLAKY Computing and Software McMaster University Hamilton, January 2004 terlaky@mcmaster.ca Tel: 27780 Optimality
More informationConditional Gradient (Frank-Wolfe) Method
Conditional Gradient (Frank-Wolfe) Method Lecturer: Aarti Singh Co-instructor: Pradeep Ravikumar Convex Optimization 10-725/36-725 1 Outline Today: Conditional gradient method Convergence analysis Properties
More informationLagrange Duality. Daniel P. Palomar. Hong Kong University of Science and Technology (HKUST)
Lagrange Duality Daniel P. Palomar Hong Kong University of Science and Technology (HKUST) ELEC5470 - Convex Optimization Fall 2017-18, HKUST, Hong Kong Outline of Lecture Lagrangian Dual function Dual
More informationICS-E4030 Kernel Methods in Machine Learning
ICS-E4030 Kernel Methods in Machine Learning Lecture 3: Convex optimization and duality Juho Rousu 28. September, 2016 Juho Rousu 28. September, 2016 1 / 38 Convex optimization Convex optimisation This
More informationarxiv: v1 [math.oc] 1 Jul 2016
Convergence Rate of Frank-Wolfe for Non-Convex Objectives Simon Lacoste-Julien INRIA - SIERRA team ENS, Paris June 8, 016 Abstract arxiv:1607.00345v1 [math.oc] 1 Jul 016 We give a simple proof that the
More informationCS-E4830 Kernel Methods in Machine Learning
CS-E4830 Kernel Methods in Machine Learning Lecture 3: Convex optimization and duality Juho Rousu 27. September, 2017 Juho Rousu 27. September, 2017 1 / 45 Convex optimization Convex optimisation This
More informationShiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 4. Subgradient
Shiqian Ma, MAT-258A: Numerical Optimization 1 Chapter 4 Subgradient Shiqian Ma, MAT-258A: Numerical Optimization 2 4.1. Subgradients definition subgradient calculus duality and optimality conditions Shiqian
More informationConvex Optimization. (EE227A: UC Berkeley) Lecture 15. Suvrit Sra. (Gradient methods III) 12 March, 2013
Convex Optimization (EE227A: UC Berkeley) Lecture 15 (Gradient methods III) 12 March, 2013 Suvrit Sra Optimal gradient methods 2 / 27 Optimal gradient methods We saw following efficiency estimates for
More informationPrimal/Dual Decomposition Methods
Primal/Dual Decomposition Methods Daniel P. Palomar Hong Kong University of Science and Technology (HKUST) ELEC5470 - Convex Optimization Fall 2018-19, HKUST, Hong Kong Outline of Lecture Subgradients
More informationConvex Optimization. (EE227A: UC Berkeley) Lecture 4. Suvrit Sra. (Conjugates, subdifferentials) 31 Jan, 2013
Convex Optimization (EE227A: UC Berkeley) Lecture 4 (Conjugates, subdifferentials) 31 Jan, 2013 Suvrit Sra Organizational HW1 due: 14th Feb 2013 in class. Please L A TEX your solutions (contact TA if this
More informationIntroduction to Machine Learning Lecture 7. Mehryar Mohri Courant Institute and Google Research
Introduction to Machine Learning Lecture 7 Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu Convex Optimization Differentiation Definition: let f : X R N R be a differentiable function,
More informationThe Lagrangian L : R d R m R r R is an (easier to optimize) lower bound on the original problem:
HT05: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford Convex Optimization and slides based on Arthur Gretton s Advanced Topics in Machine Learning course
More informationFrank-Wolfe Method. Ryan Tibshirani Convex Optimization
Frank-Wolfe Method Ryan Tibshirani Convex Optimization 10-725 Last time: ADMM For the problem min x,z f(x) + g(z) subject to Ax + Bz = c we form augmented Lagrangian (scaled form): L ρ (x, z, w) = f(x)
More informationLecture Note 5: Semidefinite Programming for Stability Analysis
ECE7850: Hybrid Systems:Theory and Applications Lecture Note 5: Semidefinite Programming for Stability Analysis Wei Zhang Assistant Professor Department of Electrical and Computer Engineering Ohio State
More informationApplications of Linear Programming
Applications of Linear Programming lecturer: András London University of Szeged Institute of Informatics Department of Computational Optimization Lecture 9 Non-linear programming In case of LP, the goal
More informationLecture 7: Convex Optimizations
Lecture 7: Convex Optimizations Radu Balan, David Levermore March 29, 2018 Convex Sets. Convex Functions A set S R n is called a convex set if for any points x, y S the line segment [x, y] := {tx + (1
More informationSubgradients. subgradients and quasigradients. subgradient calculus. optimality conditions via subgradients. directional derivatives
Subgradients subgradients and quasigradients subgradient calculus optimality conditions via subgradients directional derivatives Prof. S. Boyd, EE392o, Stanford University Basic inequality recall basic
More informationOptimality Conditions for Constrained Optimization
72 CHAPTER 7 Optimality Conditions for Constrained Optimization 1. First Order Conditions In this section we consider first order optimality conditions for the constrained problem P : minimize f 0 (x)
More informationDuality. Geoff Gordon & Ryan Tibshirani Optimization /
Duality Geoff Gordon & Ryan Tibshirani Optimization 10-725 / 36-725 1 Duality in linear programs Suppose we want to find lower bound on the optimal value in our convex problem, B min x C f(x) E.g., consider
More informationCSCI : Optimization and Control of Networks. Review on Convex Optimization
CSCI7000-016: Optimization and Control of Networks Review on Convex Optimization 1 Convex set S R n is convex if x,y S, λ,µ 0, λ+µ = 1 λx+µy S geometrically: x,y S line segment through x,y S examples (one
More informationsubject to (x 2)(x 4) u,
Exercises Basic definitions 5.1 A simple example. Consider the optimization problem with variable x R. minimize x 2 + 1 subject to (x 2)(x 4) 0, (a) Analysis of primal problem. Give the feasible set, the
More informationNewton s Method. Javier Peña Convex Optimization /36-725
Newton s Method Javier Peña Convex Optimization 10-725/36-725 1 Last time: dual correspondences Given a function f : R n R, we define its conjugate f : R n R, f ( (y) = max y T x f(x) ) x Properties and
More informationOn the Method of Lagrange Multipliers
On the Method of Lagrange Multipliers Reza Nasiri Mahalati November 6, 2016 Most of what is in this note is taken from the Convex Optimization book by Stephen Boyd and Lieven Vandenberghe. This should
More informationA Brief Review on Convex Optimization
A Brief Review on Convex Optimization 1 Convex set S R n is convex if x,y S, λ,µ 0, λ+µ = 1 λx+µy S geometrically: x,y S line segment through x,y S examples (one convex, two nonconvex sets): A Brief Review
More informationLecture 3: Lagrangian duality and algorithms for the Lagrangian dual problem
Lecture 3: Lagrangian duality and algorithms for the Lagrangian dual problem Michael Patriksson 0-0 The Relaxation Theorem 1 Problem: find f := infimum f(x), x subject to x S, (1a) (1b) where f : R n R
More informationLECTURE 25: REVIEW/EPILOGUE LECTURE OUTLINE
LECTURE 25: REVIEW/EPILOGUE LECTURE OUTLINE CONVEX ANALYSIS AND DUALITY Basic concepts of convex analysis Basic concepts of convex optimization Geometric duality framework - MC/MC Constrained optimization
More informationLecture 18: Optimization Programming
Fall, 2016 Outline Unconstrained Optimization 1 Unconstrained Optimization 2 Equality-constrained Optimization Inequality-constrained Optimization Mixture-constrained Optimization 3 Quadratic Programming
More informationSubgradients. subgradients. strong and weak subgradient calculus. optimality conditions via subgradients. directional derivatives
Subgradients subgradients strong and weak subgradient calculus optimality conditions via subgradients directional derivatives Prof. S. Boyd, EE364b, Stanford University Basic inequality recall basic inequality
More informationEE 227A: Convex Optimization and Applications October 14, 2008
EE 227A: Convex Optimization and Applications October 14, 2008 Lecture 13: SDP Duality Lecturer: Laurent El Ghaoui Reading assignment: Chapter 5 of BV. 13.1 Direct approach 13.1.1 Primal problem Consider
More informationSupport Vector Machines
Support Vector Machines Support vector machines (SVMs) are one of the central concepts in all of machine learning. They are simply a combination of two ideas: linear classification via maximum (or optimal
More informationConvex Optimization. Newton s method. ENSAE: Optimisation 1/44
Convex Optimization Newton s method ENSAE: Optimisation 1/44 Unconstrained minimization minimize f(x) f convex, twice continuously differentiable (hence dom f open) we assume optimal value p = inf x f(x)
More informationHomework Set #6 - Solutions
EE 15 - Applications of Convex Optimization in Signal Processing and Communications Dr Andre Tkacenko JPL Third Term 11-1 Homework Set #6 - Solutions 1 a The feasible set is the interval [ 4] The unique
More informationPrimal-dual Subgradient Method for Convex Problems with Functional Constraints
Primal-dual Subgradient Method for Convex Problems with Functional Constraints Yurii Nesterov, CORE/INMA (UCL) Workshop on embedded optimization EMBOPT2014 September 9, 2014 (Lucca) Yu. Nesterov Primal-dual
More informationLinear and non-linear programming
Linear and non-linear programming Benjamin Recht March 11, 2005 The Gameplan Constrained Optimization Convexity Duality Applications/Taxonomy 1 Constrained Optimization minimize f(x) subject to g j (x)
More informationMath 273a: Optimization Subgradients of convex functions
Math 273a: Optimization Subgradients of convex functions Made by: Damek Davis Edited by Wotao Yin Department of Mathematics, UCLA Fall 2015 online discussions on piazza.com 1 / 42 Subgradients Assumptions
More informationQuiz Discussion. IE417: Nonlinear Programming: Lecture 12. Motivation. Why do we care? Jeff Linderoth. 16th March 2006
Quiz Discussion IE417: Nonlinear Programming: Lecture 12 Jeff Linderoth Department of Industrial and Systems Engineering Lehigh University 16th March 2006 Motivation Why do we care? We are interested in
More informationChap 2. Optimality conditions
Chap 2. Optimality conditions Version: 29-09-2012 2.1 Optimality conditions in unconstrained optimization Recall the definitions of global, local minimizer. Geometry of minimization Consider for f C 1
More informationLecture 8. Strong Duality Results. September 22, 2008
Strong Duality Results September 22, 2008 Outline Lecture 8 Slater Condition and its Variations Convex Objective with Linear Inequality Constraints Quadratic Objective over Quadratic Constraints Representation
More informationConvex Optimization. Ofer Meshi. Lecture 6: Lower Bounds Constrained Optimization
Convex Optimization Ofer Meshi Lecture 6: Lower Bounds Constrained Optimization Lower Bounds Some upper bounds: #iter μ 2 M #iter 2 M #iter L L μ 2 Oracle/ops GD κ log 1/ε M x # ε L # x # L # ε # με f
More informationLecture 14: Optimality Conditions for Conic Problems
EE 227A: Conve Optimization and Applications March 6, 2012 Lecture 14: Optimality Conditions for Conic Problems Lecturer: Laurent El Ghaoui Reading assignment: 5.5 of BV. 14.1 Optimality for Conic Problems
More informationAdditional Homework Problems
Additional Homework Problems Robert M. Freund April, 2004 2004 Massachusetts Institute of Technology. 1 2 1 Exercises 1. Let IR n + denote the nonnegative orthant, namely IR + n = {x IR n x j ( ) 0,j =1,...,n}.
More informationNonlinear Programming
Nonlinear Programming Kees Roos e-mail: C.Roos@ewi.tudelft.nl URL: http://www.isa.ewi.tudelft.nl/ roos LNMB Course De Uithof, Utrecht February 6 - May 8, A.D. 2006 Optimization Group 1 Outline for week
More informationLagrange Relaxation and Duality
Lagrange Relaxation and Duality As we have already known, constrained optimization problems are harder to solve than unconstrained problems. By relaxation we can solve a more difficult problem by a simpler
More informationUses of duality. Geoff Gordon & Ryan Tibshirani Optimization /
Uses of duality Geoff Gordon & Ryan Tibshirani Optimization 10-725 / 36-725 1 Remember conjugate functions Given f : R n R, the function is called its conjugate f (y) = max x R n yt x f(x) Conjugates appear
More informationConvex Optimization and SVM
Convex Optimization and SVM Problem 0. Cf lecture notes pages 12 to 18. Problem 1. (i) A slab is an intersection of two half spaces, hence convex. (ii) A wedge is an intersection of two half spaces, hence
More informationminimize x subject to (x 2)(x 4) u,
Math 6366/6367: Optimization and Variational Methods Sample Preliminary Exam Questions 1. Suppose that f : [, L] R is a C 2 -function with f () on (, L) and that you have explicit formulae for
More informationConvex Optimization and Modeling
Convex Optimization and Modeling Duality Theory and Optimality Conditions 5th lecture, 12.05.2010 Jun.-Prof. Matthias Hein Program of today/next lecture Lagrangian and duality: the Lagrangian the dual
More informationEnhanced Fritz John Optimality Conditions and Sensitivity Analysis
Enhanced Fritz John Optimality Conditions and Sensitivity Analysis Dimitri P. Bertsekas Laboratory for Information and Decision Systems Massachusetts Institute of Technology March 2016 1 / 27 Constrained
More informationConvex Optimization. (EE227A: UC Berkeley) Lecture 6. Suvrit Sra. (Conic optimization) 07 Feb, 2013
Convex Optimization (EE227A: UC Berkeley) Lecture 6 (Conic optimization) 07 Feb, 2013 Suvrit Sra Organizational Info Quiz coming up on 19th Feb. Project teams by 19th Feb Good if you can mix your research
More informationLecture 23: November 19
10-725/36-725: Conve Optimization Fall 2018 Lecturer: Ryan Tibshirani Lecture 23: November 19 Scribes: Charvi Rastogi, George Stoica, Shuo Li Charvi Rastogi: 23.1-23.4.2, George Stoica: 23.4.3-23.8, Shuo
More informationE5295/5B5749 Convex optimization with engineering applications. Lecture 5. Convex programming and semidefinite programming
E5295/5B5749 Convex optimization with engineering applications Lecture 5 Convex programming and semidefinite programming A. Forsgren, KTH 1 Lecture 5 Convex optimization 2006/2007 Convex quadratic program
More informationDuality in Linear Programs. Lecturer: Ryan Tibshirani Convex Optimization /36-725
Duality in Linear Programs Lecturer: Ryan Tibshirani Convex Optimization 10-725/36-725 1 Last time: proximal gradient descent Consider the problem x g(x) + h(x) with g, h convex, g differentiable, and
More informationLecture 5. The Dual Cone and Dual Problem
IE 8534 1 Lecture 5. The Dual Cone and Dual Problem IE 8534 2 For a convex cone K, its dual cone is defined as K = {y x, y 0, x K}. The inner-product can be replaced by x T y if the coordinates of the
More informationNumerical Optimization
Constrained Optimization Computer Science and Automation Indian Institute of Science Bangalore 560 012, India. NPTEL Course on Constrained Optimization Constrained Optimization Problem: min h j (x) 0,
More informationInterior Point Algorithms for Constrained Convex Optimization
Interior Point Algorithms for Constrained Convex Optimization Chee Wei Tan CS 8292 : Advanced Topics in Convex Optimization and its Applications Fall 2010 Outline Inequality constrained minimization problems
More informationConvex Optimization Theory. Chapter 5 Exercises and Solutions: Extended Version
Convex Optimization Theory Chapter 5 Exercises and Solutions: Extended Version Dimitri P. Bertsekas Massachusetts Institute of Technology Athena Scientific, Belmont, Massachusetts http://www.athenasc.com
More informationNewton s Method. Ryan Tibshirani Convex Optimization /36-725
Newton s Method Ryan Tibshirani Convex Optimization 10-725/36-725 1 Last time: dual correspondences Given a function f : R n R, we define its conjugate f : R n R, Properties and examples: f (y) = max x
More informationLagrangian Duality Theory
Lagrangian Duality Theory Yinyu Ye Department of Management Science and Engineering Stanford University Stanford, CA 94305, U.S.A. http://www.stanford.edu/ yyye Chapter 14.1-4 1 Recall Primal and Dual
More informationTutorial on Convex Optimization: Part II
Tutorial on Convex Optimization: Part II Dr. Khaled Ardah Communications Research Laboratory TU Ilmenau Dec. 18, 2018 Outline Convex Optimization Review Lagrangian Duality Applications Optimal Power Allocation
More information12. Interior-point methods
12. Interior-point methods Convex Optimization Boyd & Vandenberghe inequality constrained minimization logarithmic barrier function and central path barrier method feasibility and phase I methods complexity
More information8. Conjugate functions
L. Vandenberghe EE236C (Spring 2013-14) 8. Conjugate functions closed functions conjugate function 8-1 Closed set a set C is closed if it contains its boundary: x k C, x k x = x C operations that preserve
More informationLecture 2: Convex Sets and Functions
Lecture 2: Convex Sets and Functions Hyang-Won Lee Dept. of Internet & Multimedia Eng. Konkuk University Lecture 2 Network Optimization, Fall 2015 1 / 22 Optimization Problems Optimization problems are
More informationEE364a Review Session 5
EE364a Review Session 5 EE364a Review announcements: homeworks 1 and 2 graded homework 4 solutions (check solution to additional problem 1) scpd phone-in office hours: tuesdays 6-7pm (650-723-1156) 1 Complementary
More informationUC Berkeley Department of Electrical Engineering and Computer Science. EECS 227A Nonlinear and Convex Optimization. Solutions 6 Fall 2009
UC Berkele Department of Electrical Engineering and Computer Science EECS 227A Nonlinear and Convex Optimization Solutions 6 Fall 2009 Solution 6.1 (a) p = 1 (b) The Lagrangian is L(x,, λ) = e x + λx 2
More informationCoordinate Descent and Ascent Methods
Coordinate Descent and Ascent Methods Julie Nutini Machine Learning Reading Group November 3 rd, 2015 1 / 22 Projected-Gradient Methods Motivation Rewrite non-smooth problem as smooth constrained problem:
More information9. Dual decomposition and dual algorithms
EE 546, Univ of Washington, Spring 2016 9. Dual decomposition and dual algorithms dual gradient ascent example: network rate control dual decomposition and the proximal gradient method examples with simple
More informationDual methods and ADMM. Barnabas Poczos & Ryan Tibshirani Convex Optimization /36-725
Dual methods and ADMM Barnabas Poczos & Ryan Tibshirani Convex Optimization 10-725/36-725 1 Given f : R n R, the function is called its conjugate Recall conjugate functions f (y) = max x R n yt x f(x)
More informationConvex Optimization Theory. Athena Scientific, Supplementary Chapter 6 on Convex Optimization Algorithms
Convex Optimization Theory Athena Scientific, 2009 by Dimitri P. Bertsekas Massachusetts Institute of Technology Supplementary Chapter 6 on Convex Optimization Algorithms This chapter aims to supplement
More information10 Numerical methods for constrained problems
10 Numerical methods for constrained problems min s.t. f(x) h(x) = 0 (l), g(x) 0 (m), x X The algorithms can be roughly divided the following way: ˆ primal methods: find descent direction keeping inside
More informationOptimality, Duality, Complementarity for Constrained Optimization
Optimality, Duality, Complementarity for Constrained Optimization Stephen Wright University of Wisconsin-Madison May 2014 Wright (UW-Madison) Optimality, Duality, Complementarity May 2014 1 / 41 Linear
More informationIntroduction to Nonlinear Stochastic Programming
School of Mathematics T H E U N I V E R S I T Y O H F R G E D I N B U Introduction to Nonlinear Stochastic Programming Jacek Gondzio Email: J.Gondzio@ed.ac.uk URL: http://www.maths.ed.ac.uk/~gondzio SPS
More informationLecture 15 Newton Method and Self-Concordance. October 23, 2008
Newton Method and Self-Concordance October 23, 2008 Outline Lecture 15 Self-concordance Notion Self-concordant Functions Operations Preserving Self-concordance Properties of Self-concordant Functions Implications
More informationA : k n. Usually k > n otherwise easily the minimum is zero. Analytical solution:
1-5: Least-squares I A : k n. Usually k > n otherwise easily the minimum is zero. Analytical solution: f (x) =(Ax b) T (Ax b) =x T A T Ax 2b T Ax + b T b f (x) = 2A T Ax 2A T b = 0 Chih-Jen Lin (National
More informationTwo hours. To be provided by Examinations Office: Mathematical Formula Tables. THE UNIVERSITY OF MANCHESTER. xx xxxx 2017 xx:xx xx.
Two hours To be provided by Examinations Office: Mathematical Formula Tables. THE UNIVERSITY OF MANCHESTER CONVEX OPTIMIZATION - SOLUTIONS xx xxxx 27 xx:xx xx.xx Answer THREE of the FOUR questions. If
More informationCourse Notes for EE227C (Spring 2018): Convex Optimization and Approximation
Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation Instructor: Moritz Hardt Email: hardt+ee227c@berkeley.edu Graduate Instructor: Max Simchowitz Email: msimchow+ee227c@berkeley.edu
More informationTutorial on Convex Optimization for Engineers Part II
Tutorial on Convex Optimization for Engineers Part II M.Sc. Jens Steinwandt Communications Research Laboratory Ilmenau University of Technology PO Box 100565 D-98684 Ilmenau, Germany jens.steinwandt@tu-ilmenau.de
More informationA : k n. Usually k > n otherwise easily the minimum is zero. Analytical solution:
1-5: Least-squares I A : k n. Usually k > n otherwise easily the minimum is zero. Analytical solution: f (x) =(Ax b) T (Ax b) =x T A T Ax 2b T Ax + b T b f (x) = 2A T Ax 2A T b = 0 Chih-Jen Lin (National
More informationUC Berkeley Department of Electrical Engineering and Computer Science. EECS 227A Nonlinear and Convex Optimization. Solutions 5 Fall 2009
UC Berkeley Department of Electrical Engineering and Computer Science EECS 227A Nonlinear and Convex Optimization Solutions 5 Fall 2009 Reading: Boyd and Vandenberghe, Chapter 5 Solution 5.1 Note that
More informationExtended Monotropic Programming and Duality 1
March 2006 (Revised February 2010) Report LIDS - 2692 Extended Monotropic Programming and Duality 1 by Dimitri P. Bertsekas 2 Abstract We consider the problem minimize f i (x i ) subject to x S, where
More information