Tutorial on Convex Optimization for Engineers Part II M.Sc. Jens Steinwandt Communications Research Laboratory Ilmenau University of Technology PO Box 100565 D-98684 Ilmenau, Germany jens.steinwandt@tu-ilmenau.de January 2014
Outline 1. Introduction 2. Convex sets 3. Convex functions 4. Convex optimization problems 5. Lagrangian duality theory 6. Disciplined convex programming and CVX
4. Convex optimization problems
Optimization problem in standard form minimize f 0 (x) subject to f i (x) 0, i = 1,...,m h i (x) = 0, i = 1,...,p x R n is the optimization variable f 0 : R n R is the objective or cost function f i : R n R, i = 1,...,m, are the inequality constraint functions h i : R n R are the equality constraint functions optimal value: p = inf{f 0 (x) f i (x) 0, i = 1,...,m, h i (x) = 0, i = 1,...,p} p = if problem is infeasible (no x satisfies the constraints) p = if problem is unbounded below Convex optimization problems 4 2
Feasibility problem find x subject to f i (x) 0, i = 1,...,m h i (x) = 0, i = 1,...,p can be considered a special case of the general problem with f 0 (x) = 0: minimize 0 subject to f i (x) 0, i = 1,...,m h i (x) = 0, i = 1,...,p p = 0 if constraints are feasible; any feasible x is optimal p = if constraints are infeasible Convex optimization problems 4 5
Convex optimization problem standard form convex optimization problem minimize f 0 (x) subject to f i (x) 0, i = 1,...,m a T i x = b i, i = 1,...,p f 0, f 1,..., f m are convex; equality constraints are affine problem is quasiconvex if f 0 is quasiconvex (and f 1,..., f m convex) often written as minimize f 0 (x) subject to f i (x) 0, i = 1,...,m Ax = b important property: feasible set of a convex optimization problem is convex Convex optimization problems 4 6
Equivalent convex problems two problems are (informally) equivalent if the solution of one is readily obtained from the solution of the other, and vice-versa some common transformations that preserve convexity: eliminating equality constraints is equivalent to minimize f 0 (x) subject to f i (x) 0, i = 1,...,m Ax = b minimize (over z) f 0 (Fz +x 0 ) subject to f i (Fz +x 0 ) 0, i = 1,...,m where F and x 0 are such that Ax = b x = Fz +x 0 for some z Convex optimization problems 4 11
introducing equality constraints is equivalent to minimize f 0 (A 0 x+b 0 ) subject to f i (A i x+b i ) 0, i = 1,...,m minimize (over x, y i ) f 0 (y 0 ) subject to f i (y i ) 0, i = 1,...,m y i = A i x+b i, i = 0,1,...,m introducing slack variables for linear inequalities is equivalent to minimize f 0 (x) subject to a T i x b i, i = 1,...,m minimize (over x, s) f 0 (x) subject to a T i x+s i = b i, i = 1,...,m s i 0, i = 1,...m Convex optimization problems 4 12
epigraph form: standard form convex problem is equivalent to minimize (over x, t) t subject to f 0 (x) t 0 f i (x) 0, i = 1,...,m Ax = b minimizing over some variables is equivalent to where f 0 (x 1 ) = inf x2 f 0 (x 1,x 2 ) minimize f 0 (x 1,x 2 ) subject to f i (x 1 ) 0, i = 1,...,m minimize f0 (x 1 ) subject to f i (x 1 ) 0, i = 1,...,m Convex optimization problems 4 13
Linear program (LP) minimize c T x+d subject to Gx h Ax = b convex problem with affine objective and constraint functions feasible set is a polyhedron P x c Convex optimization problems 4 17
Examples diet problem: choose quantities x 1,..., x n of n foods one unit of food j costs c j, contains amount a ij of nutrient i healthy diet requires nutrient i in quantity at least b i to find cheapest healthy diet, minimize c T x subject to Ax b, x 0 piecewise-linear minimization equivalent to an LP minimize max i=1,...,m (a T i x+b i) minimize t subject to a T i x+b i t, i = 1,...,m Convex optimization problems 4 18
Quadratic program (QP) minimize (1/2)x T Px+q T x+r subject to Gx h Ax = b P S n +, so objective is convex quadratic minimize a convex quadratic function over a polyhedron f 0 (x ) x P Convex optimization problems 4 22
Examples least-squares minimize Ax b 2 2 analytical solution x = A b (A is pseudo-inverse) can add linear constraints, e.g., l x u linear program with random cost minimize c T x+γx T Σx = Ec T x+γvar(c T x) subject to Gx h, Ax = b c is random vector with mean c and covariance Σ hence, c T x is random variable with mean c T x and variance x T Σx γ > 0 is risk aversion parameter; controls the trade-off between expected cost and variance (risk) Convex optimization problems 4 23
Quadratically constrained quadratic program (QCQP) minimize (1/2)x T P 0 x+q T 0x+r 0 subject to (1/2)x T P i x+q T i x+r i 0, i = 1,...,m Ax = b P i S n +; objective and constraints are convex quadratic if P 1,...,P m S n ++, feasible region is intersection of m ellipsoids and an affine set Convex optimization problems 4 24
Second-order cone programming minimize f T x subject to A i x+b i 2 c T i x+d i, i = 1,...,m Fx = g (A i R n i n, F R p n ) inequalities are called second-order cone (SOC) constraints: (A i x+b i,c T i x+d i ) second-order cone in R n i+1 for n i = 0, reduces to an LP; if c i = 0, reduces to a QCQP more general than QCQP and LP Convex optimization problems 4 25
Semidefinite program (SDP) with F i, G S k minimize c T x subject to x 1 F 1 +x 2 F 2 + +x n F n +G 0 Ax = b inequality constraint is called linear matrix inequality (LMI) includes problems with multiple LMI constraints: for example, x 1ˆF1 + +x nˆfn +Ĝ 0, x 1 F 1 + +x n Fn + G 0 is equivalent to single LMI x 1 [ ˆF1 0 0 F1 ] +x 2 [ ˆF2 0 0 F2 ] + +x n [ ˆFn 0 0 Fn ] [ Ĝ 0 + 0 G ] 0 Convex optimization problems 4 36
LP and SOCP as SDP LP and equivalent SDP LP: minimize c T x subject to Ax b SDP: minimize c T x subject to diag(ax b) 0 (note different interpretation of generalized inequality ) SOCP and equivalent SDP SOCP: minimize f T x subject to A i x+b i 2 c T i x+d i, i = 1,...,m SDP: minimize f [ T x (c T subject to i x+d i )I A i x+b i (A i x+b i ) T c T i x+d i ] 0, i = 1,...,m Convex optimization problems 4 37
Some nonconvex problems slight modifications of convex problems can be very hard convex maximization, concave minimization: maximize Ax b 2 subject to x 1 nonlinear equality constraints: minimize c T x subject to x T Q ix + qi T x + c i = 0, 1 i K, where Q i 0 minimizing over integer constraints: find x such that Ax b, x i is integer Convex optimization problems
5. Lagrangian duality theory
Lagrangian standard form problem (not necessarily convex) minimize f 0 (x) subject to f i (x) 0, i = 1,...,m h i (x) = 0, i = 1,...,p variable x R n, domain D, optimal value p Lagrangian: L : R n R m R p R, with doml = D R m R p, L(x,λ,ν) = f 0 (x)+ m λ i f i (x)+ i=1 p ν i h i (x) i=1 weighted sum of objective and constraint functions λ i is Lagrange multiplier associated with f i (x) 0 ν i is Lagrange multiplier associated with h i (x) = 0 Duality 5 2
Lagrange dual function Lagrange dual function: g : R m R p R, g(λ,ν) = inf L(x,λ,ν) x D ( = inf x D f 0 (x)+ m λ i f i (x)+ i=1 ) p ν i h i (x) i=1 g is concave, can be for some λ, ν lower bound property: if λ 0, then g(λ,ν) p proof: if x is feasible and λ 0, then f 0 ( x) L( x,λ,ν) inf L(x,λ,ν) = g(λ,ν) x D minimizing over all feasible x gives p g(λ,ν) Duality 5 3
Least-norm solution of linear equations dual function minimize x T x subject to Ax = b Lagrangian is L(x,ν) = x T x+ν T (Ax b) to minimize L over x, set gradient equal to zero: x L(x,ν) = 2x+A T ν = 0 = x = (1/2)A T ν plug in in L to obtain g: a concave function of ν g(ν) = L(( 1/2)A T ν,ν) = 1 4 νt AA T ν b T ν lower bound property: p (1/4)ν T AA T ν b T ν for all ν Duality 5 4
The dual problem Lagrange dual problem maximize g(λ, ν) subject to λ 0 finds best lower bound on p, obtained from Lagrange dual function a convex optimization problem; optimal value denoted d λ, ν are dual feasible if λ 0, (λ,ν) domg often simplified by making implicit constraint (λ, ν) dom g explicit example: standard form LP and its dual (page 5 5) minimize c T x subject to Ax = b x 0 maximize b T ν subject to A T ν +c 0 Duality 5 9
weak duality: d p Weak and strong duality always holds (for convex and nonconvex problems) can be used to find nontrivial lower bounds for difficult problems for example, solving the SDP maximize 1 T ν subject to W +diag(ν) 0 gives a lower bound for the two-way partitioning problem on page 5 7 strong duality: d = p does not hold in general (usually) holds for convex problems conditions that guarantee strong duality in convex problems are called constraint qualifications Duality 5 10
Slater s constraint qualification strong duality holds for a convex problem if it is strictly feasible, i.e., minimize f 0 (x) subject to f i (x) 0, i = 1,...,m Ax = b x intd : f i (x) < 0, i = 1,...,m, Ax = b also guarantees that the dual optimum is attained (if p > ) can be sharpened: e.g., can replace intd with relintd (interior relative to affine hull); linear inequalities do not need to hold with strict inequality,... there exist many other types of constraint qualifications Duality 5 11
primal problem (assume P S n ++) Quadratic program minimize x T Px subject to Ax b dual function g(λ) = inf x ( x T Px+λ T (Ax b) ) = 1 4 λt AP 1 A T λ b T λ dual problem maximize (1/4)λ T AP 1 A T λ b T λ subject to λ 0 from Slater s condition: p = d if A x b for some x in fact, p = d always Duality 5 13
Geometric interpretation for simplicity, consider problem with one constraint f 1 (x) 0 interpretation of dual function: g(λ) = inf (u,t) G (t+λu), where G = {(f 1(x),f 0 (x)) x D} t t G G λu + t = g(λ) p g(λ) u p d u λu+t = g(λ) is (non-vertical) supporting hyperplane to G hyperplane intersects t-axis at t = g(λ) Duality 5 15
Complementary slackness assume strong duality holds, x is primal optimal, (λ,ν ) is dual optimal f 0 (x ) = g(λ,ν ) = inf x ( f 0 (x )+ f 0 (x ) f 0 (x)+ m λ if i (x)+ i=1 m λ if i (x )+ i=1 ) p νih i (x) i=1 p νih i (x ) i=1 hence, the two inequalities hold with equality x minimizes L(x,λ,ν ) λ i f i(x ) = 0 for i = 1,...,m (known as complementary slackness): λ i > 0 = f i (x ) = 0, f i (x ) < 0 = λ i = 0 Duality 5 17
Karush-Kuhn-Tucker (KKT) conditions the following four conditions are called KKT conditions (for a problem with differentiable f i, h i ): 1. primal constraints: f i (x) 0, i = 1,...,m, h i (x) = 0, i = 1,...,p 2. dual constraints: λ 0 3. complementary slackness: λ i f i (x) = 0, i = 1,...,m 4. gradient of Lagrangian with respect to x vanishes: f 0 (x)+ m λ i f i (x)+ i=1 p ν i h i (x) = 0 i=1 from page 5 17: if strong duality holds and x, λ, ν are optimal, then they must satisfy the KKT conditions Duality 5 18
KKT conditions for convex problem if x, λ, ν satisfy KKT for a convex problem, then they are optimal: from complementary slackness: f 0 ( x) = L( x, λ, ν) from 4th condition (and convexity): g( λ, ν) = L( x, λ, ν) hence, f 0 ( x) = g( λ, ν) if Slater s condition is satisfied: x is optimal if and only if there exist λ, ν that satisfy KKT conditions recall that Slater implies strong duality, and dual optimum is attained generalizes optimality condition f 0 (x) = 0 for unconstrained problem Duality 5 19
6. Disciplined convex programming and CVX
Convex optimization solvers LP solvers lots available (GLPK, Excel, Matlab s linprog,... ) cone solvers typically handle (combinations of) LP, SOCP, SDP cones several available (SDPT3, SeDuMi, CSDP,... ) general convex solvers some available (CVXOPT, MOSEK,... ) plus lots of special purpose or application specific solvers could write your own (we ll study, and write, solvers later in the quarter) Disciplined Convex Programming and CVX 2
Transforming problems to standard form you ve seen lots of tricks for transforming a problem into an equivalent one that has a standard form (e.g., LP, SDP) these tricks greatly extend the applicability of standard solvers writing code to carry out this transformation is often painful modeling systems can partly automate this step Disciplined Convex Programming and CVX 3
Disciplined convex programming describe objective and constraints using expressions formed from a set of basic atoms (convex, concave functions) a restricted set of operations or rules (that preserve convexity) modeling system keeps track of affine, convex, concave expressions rules ensure that expressions recognized as convex (concave) are convex (concave) but, some convex (concave) expressions are not recognized as convex (concave) problems described using DCP are convex by construction Disciplined Convex Programming and CVX 6
CVX uses DCP runs in Matlab, between the cvx_begin and cvx_end commands relies on SDPT3 or SeDuMi (LP/SOCP/SDP) solvers refer to user guide, online help for more info the CVX example library has more than a hundred examples Disciplined Convex Programming and CVX 7
Example: Constrained norm minimization A = randn(5, 3); b = randn(5, 1); cvx_begin variable x(3); minimize(norm(a*x - b, 1)) subject to -0.5 <= x; x <= 0.3; cvx_end between cvx_begin and cvx_end, x is a CVX variable statement subject to does nothing, but can be added for readability inequalities are intepreted elementwise Disciplined Convex Programming and CVX 8
What CVX does after cvx_end, CVX transforms problem into an LP calls solver SDPT3 overwrites (object) x with (numeric) optimal value assigns problem optimal value to cvx_optval assigns problem status (which here is Solved) to cvx_status (had problem been infeasible, cvx_status would be Infeasible and x would be NaN) Disciplined Convex Programming and CVX 9
Some functions function meaning attributes norm(x, p) x p cvx square(x) x 2 cvx square_pos(x) (x + ) 2 cvx, nondecr pos(x) x + cvx, nondecr sum_largest(x,k) x [1] + + x [k] cvx, nondecr sqrt(x) x (x 0) ccv, nondecr inv_pos(x) 1/x (x > 0) cvx, nonincr max(x) max{x 1,..., x n } cvx, nondecr quad_over_lin(x,y) x 2 /y (y > 0) cvx, nonincr in y lambda_max(x) λ max (X) (X = X T ) cvx { x 2, x 1 huber(x) cvx 2 x 1, x > 1 Disciplined Convex Programming and CVX 11
References Course Convex Optimization I by Prof. Stephen Boyd at Stanford University, CA. Stephen Boyd and Lieven Vandenberghe, Convex Optimization, Cambridge, U.K.: Cambridge University Press, 2004.