Optimisation convexe: performance, complexité et applications. Introduction, convexité, dualité. A. d Aspremont. M2 OJME: Optimisation convexe. 1/128
Today Convex optimization: introduction Course organization and other gory details... Convex optimization: basic concepts A. d Aspremont. M2 OJME: Optimisation convexe. 2/128
Convex Optimization A. d Aspremont. M2 OJME: Optimisation convexe. 3/128
Convex optimization minimize f 0 (x) subject to f 1 (x) 0,..., f m (x) 0 x R n is optimization variable; f i : R n R are convex: f i (λx + (1 λ)y) λf i (x) + (1 λ)f i (y) for all x, y, 0 λ 1 This template includes LS, LP, QP, and many others. Good news: convex problems (LP, QP, etc) are fundamentally tractable. Bad news: this is an exception, most nonconvex are completely intractable. A. d Aspremont. M2 OJME: Optimisation convexe. 4/128
Convex optimization A brief history... The field is about 50 years old. Starts with the work of Von Neumann, Kuhn and Tucker, etc. Explodes in the 60 s with the advent of relatively cheap and efficient computers... Key to all this: fast linear algebra Some of the theory developed before computers even existed... A. d Aspremont. M2 OJME: Optimisation convexe. 5/128
Convex optimization: history Historical view: nonlinear problems are hard, linear ones are easy. In reality: Convexity = low complexity... In fact the great watershed in optimization isn t between linearity and nonlinearity, but convexity and nonconvexity. T. Rockafellar. True: Nemirovskii and Yudin [1979]. Very true: Karmarkar [1984]. Seriously true: convex programming, Nesterov and Nemirovskii [1994]. A. d Aspremont. M2 OJME: Optimisation convexe. 6/128
Convexity, complexity All convex minimization problems with: a first order oracle (returning f(x) and a subgradient) can be solved in polynomial time in size and number of precision digits. Proved using the ellipsoid method by Nemirovskii and Yudin [1979]. Very slow convergence in practice. A. d Aspremont. M2 OJME: Optimisation convexe. 7/128
Linear Programming Simplex algorithm by Dantzig (1949): exponential worst-case complexity, very efficient in most cases. Khachiyan [1979] then used the ellipsoid method to show the polynomial complexity of LP. Karmarkar [1984] describes the first efficient polynomial time algorithm for LP, using interior point methods. A. d Aspremont. M2 OJME: Optimisation convexe. 8/128
From LP to structured convex programs Nesterov and Nemirovskii [1994] show that the interior point methods (IPM) used for LPs can be applied to a larger class of structured convex problems. The self-concordance analysis that they introduce extends the polynomial time complexity proof for LPs. Most operations that preserve convexity also preserve self-concordance. A. d Aspremont. M2 OJME: Optimisation convexe. 9/128
Large-scale convex programs Interior point methods. IPM essentially solved once and for all a broad range of medium-scale convex programs. For large-scale problems, computing a single Newton step is often too expensive First order methods. Dependence on precision is polynomial O(1/ɛ α ), not logarithmic O(log(1/ɛ)). This is OK in many applications (stats, etc). Run a much larger number of cheaper iterations. No Hessian means significantly lower memory and CPU costs per iteration. No unified analysis (self-concordance for IPM): large library of disparate methods. Algorithmic choices strictly constrained by problem structure. Objective: classify these techniques, study their performance & complexity. A. d Aspremont. M2 OJME: Optimisation convexe. 10/128
Symmetric cone programs An important particular case: linear programming on symmetric cones minimize subject to c T x Ax b K These include the LP, second-order (Lorentz) and semidefinite cone: LP: {x R n : x 0} Second order: {(x, y) R n R : x y} Semidefinite: {X S n : X 0} Broad class of problems can be represented in this way. Good news: Fast, reliable, open-source solvers available (SDPT3, CVX, etc). This course will describe some exotic applications of these programs. A. d Aspremont. M2 OJME: Optimisation convexe. 11/128
A few miracles Beyond convexity... Hidden convexity. Convex programs solving nonconvex problems (S-lemma). Approximation results. Approximating combinatorial problems by convex programs. Approximate S-lemma. Approximation ratio for MaxCut, etc. Recovery results on l 1 penalties. Finding sparse solutions to optimization problems using convex penalties. Sparse signal reconstruction. Matrix completion (collaborative filtering, NETFLIX, etc.). A. d Aspremont. M2 OJME: Optimisation convexe. 12/128
Course Organization A. d Aspremont. M2 OJME: Optimisation convexe. 13/128
Course outline Fundamental definitions A brief primer on convexity and duality theory Algorithmic complexity Interior point methods, self-concordance. First order algorithms: complexity and classification. Modern applications Statistics Geometrical problems, graphs. Some miracles : approximation, asymptotic and hidden convexity results Measure concentration results. S-lemma, MaxCut, low rank SDP solutions, nonconvex QCQP, etc. High dimensional geometry l 1 recovery, matrix completion, convex deconvolution, etc. A. d Aspremont. M2 OJME: Optimisation convexe. 14/128
Info Course website with lecture notes, homework, etc. http://www.cmap.polytechnique.fr/~aspremon/ojmem2.html A final project. A. d Aspremont. M2 OJME: Optimisation convexe. 15/128
Short blurb Contact info on http://www.cmap.polytechnique.fr/~aspremon Email: alexandre.daspremont@m4x.org Dual PhDs: Ecole Polytechnique & Stanford University Interests: Optimization, machine learning, statistics & finance. Recruiting PhD students... Full fellowship. A. d Aspremont. M2 OJME: Optimisation convexe. 16/128
References All lecture notes will be posted online, none of the books below are required. Nesterov [2003], Introductory Lectures on Convex Optimization, Springer. Convex Optimization by Lieven Vandenberghe and Stephen Boyd, available online at: http://www.stanford.edu/~boyd/cvxbook/ See also Ben-Tal and Nemirovski [2001], Lectures On Modern Convex Optimization: Analysis, Algorithms, And Engineering Applications, SIAM. http://www2.isye.gatech.edu/~nemirovs/ Nesterov and Nemirovskii [1994], Interior Point Polynomial Algorithms in Convex Programming, SIAM. A. d Aspremont. M2 OJME: Optimisation convexe. 17/128
Convex Sets A. d Aspremont. M2 OJME: Optimisation convexe. 18/128
Convex Sets affine and convex sets some important examples operations that preserve convexity generalized inequalities separating and supporting hyperplanes dual cones and generalized inequalities A. d Aspremont. M2 OJME: Optimisation convexe. 19/128
Convex set line segment between x 1 and x 2 : all points with 0 θ 1 x = θx 1 + (1 θ)x 2 convex set: contains line segment between any two points in the set x 1, x 2 C, 0 θ 1 = θx 1 + (1 θ)x 2 C examples (one convex, two nonconvex sets) A. d Aspremont. M2 OJME: Optimisation convexe. 20/128
Convex combination and convex hull convex combination of x 1,..., x k : any point x of the form with θ 1 + + θ k = 1, θ i 0 x = θ 1 x 1 + θ 2 x 2 + + θ k x k convex hull CoS: set of all convex combinations of points in S A. d Aspremont. M2 OJME: Optimisation convexe. 21/128
Hyperplanes and halfspaces hyperplane: set of the form {x a T x = b} (a 0) a x 0 x a T x = b halfspace: set of the form {x a T x b} (a 0) a x 0 a T x b a T x b a is the normal vector hyperplanes are affine and convex; halfspaces are convex A. d Aspremont. M2 OJME: Optimisation convexe. 22/128
Euclidean balls and ellipsoids (Euclidean) ball with center x c and radius r: B(x c, r) = {x x x c 2 r} = {x c + ru u 2 1} Ellipsoid: set of the form {x (x x c ) T P 1 (x x c ) 1} with P S n ++ (i.e., P symmetric positive definite) x c other representation: {x c + Au u 2 1}, with A square and nonsingular. Representation impacts problem formulation & complexity. Idem for polytopes, with polynomial number of vertices, exponential number of facets, and vice-versa. A. d Aspremont. M2 OJME: Optimisation convexe. 23/128
Norm balls and norm cones norm: a function that satisfies x 0; x = 0 if and only if x = 0 tx = t x for t R x + y x + y notation: is general (unspecified) norm; symb is particular norm norm ball with center x c and radius r: {x x x c r} 1 norm cone: {(x, t) x t} t 0.5 Euclidean norm cone is called secondorder cone 1 0 0 x 2 1 1 x 1 0 1 norm balls and cones are convex A. d Aspremont. M2 OJME: Optimisation convexe. 24/128
Polyhedra solution set of finitely many linear inequalities and equalities Ax b, Cx = d (A R m n, C R p n, is componentwise inequality) a 1 a2 a 5 P a 3 a 4 polyhedron is intersection of finite number of halfspaces and hyperplanes A. d Aspremont. M2 OJME: Optimisation convexe. 25/128
Positive semidefinite cone notation: S n is set of symmetric n n matrices S n + = {X S n X 0}: positive semidefinite n n matrices X S n + z T Xz 0 for all z S n + is a convex cone S n ++ = {X S n X 0}: positive definite n n matrices 1 example: [ x y y z ] S 2 + z 0.5 0 1 0 y 1 0 x 0.5 1 A. d Aspremont. M2 OJME: Optimisation convexe. 26/128
Operations that preserve convexity practical methods for establishing convexity of a set C 1. apply definition x 1, x 2 C, 0 θ 1 = θx 1 + (1 θ)x 2 C 2. show that C is obtained from simple convex sets (hyperplanes, halfspaces, norm balls,... ) by operations that preserve convexity intersection affine functions perspective function linear-fractional functions A. d Aspremont. M2 OJME: Optimisation convexe. 27/128
Intersection the intersection of (any number of) convex sets is convex example: S = {x R m p(t) 1 for t π/3} where p(t) = x 1 cos t + x 2 cos 2t + + x m cos mt for m = 2: 2 p(t) 1 0 1 x2 1 0 S 1 0 π/3 2π/3 π t 2 2 1 0 1 2 x 1 A. d Aspremont. M2 OJME: Optimisation convexe. 28/128
Affine function suppose f : R n R m is affine (f(x) = Ax + b with A R m n, b R m ) the image of a convex set under f is convex S R n convex = f(s) = {f(x) x S} convex the inverse image f 1 (C) of a convex set under f is convex C R m convex = f 1 (C) = {x R n f(x) C} convex examples scaling, translation, projection solution set of linear matrix inequality {x x 1 A 1 + + x m A m B} (with A i, B S p ) hyperbolic cone {x x T P x (c T x) 2, c T x 0} (with P S n +) A. d Aspremont. M2 OJME: Optimisation convexe. 29/128
Perspective and linear-fractional function perspective function P : R n+1 R n : P (x, t) = x/t, dom P = {(x, t) t > 0} images and inverse images of convex sets under perspective are convex linear-fractional function f : R n R m : f(x) = Ax + b c T x + d, dom f = {x ct x + d > 0} images and inverse images of convex sets under linear-fractional functions are convex A. d Aspremont. M2 OJME: Optimisation convexe. 30/128
Generalized inequalities a convex cone K R n is a proper cone if K is closed (contains its boundary) K is solid (has nonempty interior) K is pointed (contains no line) examples nonnegative orthant K = R n + = {x R n x i 0, i = 1,..., n} positive semidefinite cone K = S n + nonnegative polynomials on [0, 1]: K = {x R n x 1 + x 2 t + x 3 t 2 + + x n t n 1 0 for t [0, 1]} A. d Aspremont. M2 OJME: Optimisation convexe. 31/128
generalized inequality defined by a proper cone K: x K y y x K, x K y y x int K examples componentwise inequality (K = R n +) x R n + y x i y i, i = 1,..., n matrix inequality (K = S n +) X S n + Y Y X positive semidefinite these two types are so common that we drop the subscript in K properties: many properties of K are similar to on R, e.g., x K y, u K v = x + u K y + v A. d Aspremont. M2 OJME: Optimisation convexe. 32/128
Separating hyperplane theorem if C and D are disjoint convex sets, then there exists a 0, b such that a T x b for x C, a T x b for x D a T x b a T x b D C a the hyperplane {x a T x = b} separates C and D Classical result. Proof relies on minimizing distance between set, and using the argmin to explicitly produce separating hyperplane. A. d Aspremont. M2 OJME: Optimisation convexe. 33/128
Supporting hyperplane theorem supporting hyperplane to set C at boundary point x 0 : {x a T x = a T x 0 } where a 0 and a T x a T x 0 for all x C a C x 0 supporting hyperplane theorem: if C is convex, then there exists a supporting hyperplane at every boundary point of C A. d Aspremont. M2 OJME: Optimisation convexe. 34/128
Dual cones and generalized inequalities dual cone of a cone K: examples K = {y y T x 0 for all x K} K = R n +: K = R n + K = S n +: K = S n + K = {(x, t) x 2 t}: K = {(x, t) x 2 t} K = {(x, t) x 1 t}: K = {(x, t) x t} first three examples are self-dual cones dual cones of proper cones are proper, hence define generalized inequalities: y K 0 y T x 0 for all x K 0 A. d Aspremont. M2 OJME: Optimisation convexe. 35/128
Convex Functions A. d Aspremont. M2 OJME: Optimisation convexe. 36/128
Outline basic properties and examples operations that preserve convexity the conjugate function quasiconvex functions log-concave and log-convex functions convexity with respect to generalized inequalities A. d Aspremont. M2 OJME: Optimisation convexe. 37/128
Definition f : R n R is convex if dom f is a convex set and for all x, y dom f, 0 θ 1 f(θx + (1 θ)y) θf(x) + (1 θ)f(y) (x,f(x)) (y,f(y)) f is concave if f is convex f is strictly convex if dom f is convex and for x, y dom f, x y, 0 < θ < 1 f(θx + (1 θ)y) < θf(x) + (1 θ)f(y) A. d Aspremont. M2 OJME: Optimisation convexe. 38/128
Examples on R convex: affine: ax + b on R, for any a, b R exponential: e ax, for any a R powers: x α on R ++, for α 1 or α 0 powers of absolute value: x p on R, for p 1 negative entropy: x log x on R ++ concave: affine: ax + b on R, for any a, b R powers: x α on R ++, for 0 α 1 logarithm: log x on R ++ A. d Aspremont. M2 OJME: Optimisation convexe. 39/128
Examples on R n and R m n affine functions are convex and concave; all norms are convex examples on R n affine function f(x) = a T x + b norms: x p = ( n i=1 x i p ) 1/p for p 1; x = max k x k examples on R m n (m n matrices) affine function f(x) = Tr(A T X) + b = m i=1 n A ij X ij + b j=1 spectral (maximum singular value) norm f(x) = X 2 = σ max (X) = (λ max (X T X)) 1/2 A. d Aspremont. M2 OJME: Optimisation convexe. 40/128
Restriction of a convex function to a line f : R n R is convex if and only if the function g : R R, g(t) = f(x + tv), dom g = {t x + tv dom f} is convex (in t) for any x dom f, v R n can check convexity of f by checking convexity of functions of one variable example. f : S n R with f(x) = log det X, dom X = S n ++ g(t) = log det(x + tv ) = log det X + log det(i + tx 1/2 V X 1/2 ) = n log det X + log(1 + tλ i ) where λ i are the eigenvalues of X 1/2 V X 1/2 g is concave in t (for any choice of X 0, V ); hence f is concave i=1 A. d Aspremont. M2 OJME: Optimisation convexe. 41/128
Extended-value extension extended-value extension f of f is f(x) = f(x), x dom f, f(x) =, x dom f often simplifies notation; for example, the condition 0 θ 1 = f(θx + (1 θ)y) θ f(x) + (1 θ) f(y) (as an inequality in R { }), means the same as the two conditions dom f is convex for x, y dom f, 0 θ 1 = f(θx + (1 θ)y) θf(x) + (1 θ)f(y) A. d Aspremont. M2 OJME: Optimisation convexe. 42/128
First-order condition f is differentiable if dom f is open and the gradient exists at each x dom f f(x) = ( f(x) x 1, f(x),..., f(x) ) x 2 x n 1st-order condition: differentiable f with convex domain is convex iff f(y) f(x) + f(x) T (y x) for all x, y dom f f(y) f(x) + f(x) T (y x) (x,f(x)) first-order approximation of f is global underestimator A. d Aspremont. M2 OJME: Optimisation convexe. 43/128
Second-order conditions f is twice differentiable if dom f is open and the Hessian 2 f(x) S n, exists at each x dom f 2 f(x) ij = 2 f(x) x i x j, i, j = 1,..., n, 2nd-order conditions: for twice differentiable f with convex domain f is convex if and only if 2 f(x) 0 for all x dom f if 2 f(x) 0 for all x dom f, then f is strictly convex A. d Aspremont. M2 OJME: Optimisation convexe. 44/128
Examples quadratic function: f(x) = (1/2)x T P x + q T x + r (with P S n ) f(x) = P x + q, 2 f(x) = P convex if P 0 least-squares objective: f(x) = Ax b 2 2 f(x) = 2A T (Ax b), 2 f(x) = 2A T A convex (for any A) quadratic-over-linear: f(x, y) = x 2 /y 2 2 f(x, y) = 2 y 3 [ y x ] [ y x ] T 0 f(x,y) 1 0 2 2 1 0 convex for y > 0 y 0 2 x A. d Aspremont. M2 OJME: Optimisation convexe. 45/128
log-sum-exp: f(x) = log n k=1 exp x k is convex 2 f(x) = 1 1 T z diag(z) 1 (1 T z) 2zzT (z k = exp x k ) to show 2 f(x) 0, we must verify that v T 2 f(x)v 0 for all v: v T 2 f(x)v = ( k z kv 2 k )( k z k) ( k v kz k ) 2 ( k z k) 2 0 since ( k v kz k ) 2 ( k z kv 2 k )( k z k) (from Cauchy-Schwarz inequality) geometric mean: f(x) = ( n k=1 x k) 1/n on R n ++ is concave (similar proof as for log-sum-exp) A. d Aspremont. M2 OJME: Optimisation convexe. 46/128
Epigraph and sublevel set α-sublevel set of f : R n R: C α = {x dom f f(x) α} sublevel sets of convex functions are convex (converse is false) epigraph of f : R n R: epi f = {(x, t) R n+1 x dom f, f(x) t} epif f f is convex if and only if epi f is a convex set A. d Aspremont. M2 OJME: Optimisation convexe. 47/128
Jensen s inequality basic inequality: if f is convex, then for 0 θ 1, f(θx + (1 θ)y) θf(x) + (1 θ)f(y) extension: if f is convex, then for any random variable z f(e z) E f(z) basic inequality is special case with discrete distribution Prob(z = x) = θ, Prob(z = y) = 1 θ A. d Aspremont. M2 OJME: Optimisation convexe. 48/128
Operations that preserve convexity practical methods for establishing convexity of a function 1. verify definition (often simplified by restricting to a line) 2. for twice differentiable functions, show 2 f(x) 0 3. show that f is obtained from simple convex functions by operations that preserve convexity nonnegative weighted sum composition with affine function pointwise maximum and supremum composition minimization perspective A. d Aspremont. M2 OJME: Optimisation convexe. 49/128
Positive weighted sum & composition with affine function nonnegative multiple: αf is convex if f is convex, α 0 sum: f 1 + f 2 convex if f 1, f 2 convex (extends to infinite sums, integrals) composition with affine function: f(ax + b) is convex if f is convex examples log barrier for linear inequalities f(x) = m log(b i a T i x), dom f = {x a T i x < b i, i = 1,..., m} i=1 (any) norm of affine function: f(x) = Ax + b A. d Aspremont. M2 OJME: Optimisation convexe. 50/128
Pointwise maximum if f 1,..., f m are convex, then f(x) = max{f 1 (x),..., f m (x)} is convex examples piecewise-linear function: f(x) = max i=1,...,m (a T i x + b i) is convex sum of r largest components of x R n : f(x) = x [1] + x [2] + + x [r] is convex (x [i] is ith largest component of x) proof: f(x) = max{x i1 + x i2 + + x ir 1 i 1 < i 2 < < i r n} A. d Aspremont. M2 OJME: Optimisation convexe. 51/128
Pointwise supremum if f(x, y) is convex in x for each y A, then is convex examples g(x) = sup f(x, y) y A support function of a set C: S C (x) = sup y C y T x is convex distance to farthest point in a set C: f(x) = sup x y y C maximum eigenvalue of symmetric matrix: for X S n, λ max (X) = sup y T Xy y 2 =1 A. d Aspremont. M2 OJME: Optimisation convexe. 52/128
Composition with scalar functions composition of g : R n R and h : R R: f(x) = h(g(x)) f is convex if g convex, h convex, h nondecreasing g concave, h convex, h nonincreasing proof (for n = 1, differentiable g, h) f (x) = h (g(x))g (x) 2 + h (g(x))g (x) note: monotonicity must hold for extended-value extension h examples exp g(x) is convex if g is convex 1/g(x) is convex if g is concave and positive A. d Aspremont. M2 OJME: Optimisation convexe. 53/128
Vector composition composition of g : R n R k and h : R k R: f(x) = h(g(x)) = h(g 1 (x), g 2 (x),..., g k (x)) f is convex if g i convex, h convex, h nondecreasing in each argument g i concave, h convex, h nonincreasing in each argument proof (for n = 1, differentiable g, h) f (x) = g (x) T 2 h(g(x))g (x) + h(g(x)) T g (x) examples m i=1 log g i(x) is concave if g i are concave and positive log m i=1 exp g i(x) is convex if g i are convex A. d Aspremont. M2 OJME: Optimisation convexe. 54/128
Minimization if f(x, y) is convex in (x, y) and C is a convex set, then is convex examples g(x) = inf f(x, y) y C f(x, y) = x T Ax + 2x T By + y T Cy with [ A B B T C ] 0, C 0 minimizing over y gives g(x) = inf y f(x, y) = x T (A BC 1 B T )x g is convex, hence Schur complement A BC 1 B T 0 distance to a set: dist(x, S) = inf y S x y is convex if S is convex A. d Aspremont. M2 OJME: Optimisation convexe. 55/128
The conjugate function the conjugate of a function f is f (y) = sup (y T x f(x)) x dom f f(x) xy x (0, f (y)) f is convex (even if f is not) Used in regularization, duality results,... A. d Aspremont. M2 OJME: Optimisation convexe. 56/128
examples negative logarithm f(x) = log x f (y) = sup(xy + log x) = x>0 { 1 log( y) y < 0 otherwise strictly convex quadratic f(x) = (1/2)x T Qx with Q S n ++ f (y) = sup(y x (1/2)x T Qx) x = 1 2 yt Q 1 y A. d Aspremont. M2 OJME: Optimisation convexe. 57/128
Quasiconvex functions f : R n R is quasiconvex if dom f is convex and the sublevel sets are convex for all α S α = {x dom f f(x) α} β α a b c f is quasiconcave if f is quasiconvex f is quasilinear if it is quasiconvex and quasiconcave A. d Aspremont. M2 OJME: Optimisation convexe. 58/128
Examples x is quasiconvex on R ceil(x) = inf{z Z z x} is quasilinear log x is quasilinear on R ++ f(x 1, x 2 ) = x 1 x 2 is quasiconcave on R 2 ++ linear-fractional function f(x) = at x + b c T x + d, dom f = {x ct x + d > 0} is quasilinear distance ratio is quasiconvex f(x) = x a 2 x b 2, dom f = {x x a 2 x b 2 } A. d Aspremont. M2 OJME: Optimisation convexe. 59/128
Properties modified Jensen inequality: for quasiconvex f 0 θ 1 = f(θx + (1 θ)y) max{f(x), f(y)} first-order condition: differentiable f with cvx domain is quasiconvex iff f(y) f(x) = f(x) T (y x) 0 x f(x) sums of quasiconvex functions are not necessarily quasiconvex A. d Aspremont. M2 OJME: Optimisation convexe. 60/128
Log-concave and log-convex functions a positive function f is log-concave if log f is concave: f(θx + (1 θ)y) f(x) θ f(y) 1 θ for 0 θ 1 f is log-convex if log f is convex powers: x a on R ++ is log-convex for a 0, log-concave for a 0 many common probability densities are log-concave, e.g., normal: f(x) = 1 (2π)n det Σ e 1 2 (x x)t Σ 1 (x x) cumulative Gaussian distribution function Φ is log-concave Φ(x) = 1 2π x e u2 /2 du A. d Aspremont. M2 OJME: Optimisation convexe. 61/128
Properties of log-concave functions twice differentiable f with convex domain is log-concave if and only if f(x) 2 f(x) f(x) f(x) T for all x dom f product of log-concave functions is log-concave sum of log-concave functions is not always log-concave integration: if f : R n R m R is log-concave, then g(x) = f(x, y) dy is log-concave (not easy to show) A. d Aspremont. M2 OJME: Optimisation convexe. 62/128
consequences of integration property convolution f g of log-concave functions f, g is log-concave (f g)(x) = f(x y)g(y)dy if C R n convex and y is a random variable with log-concave pdf then f(x) = Prob(x + y C) is log-concave proof: write f(x) as integral of product of log-concave functions f(x) = g(x + y)p(y) dy, g(u) = { 1 u C 0 u C, p is pdf of y A. d Aspremont. M2 OJME: Optimisation convexe. 63/128
Convex Optimization Problems A. d Aspremont. M2 OJME: Optimisation convexe. 64/128
Outline optimization problem in standard form convex optimization problems quasiconvex optimization linear optimization quadratic optimization geometric programming generalized inequality constraints semidefinite programming vector optimization A. d Aspremont. M2 OJME: Optimisation convexe. 65/128
Optimization problem in standard form minimize f 0 (x) subject to f i (x) 0, i = 1,..., m h i (x) = 0, i = 1,..., p x R n is the optimization variable f 0 : R n R is the objective or cost function f i : R n R, i = 1,..., m, are the inequality constraint functions h i : R n R are the equality constraint functions optimal value: p = inf{f 0 (x) f i (x) 0, i = 1,..., m, h i (x) = 0, i = 1,..., p} p = if problem is infeasible (no x satisfies the constraints) p = if problem is unbounded below A. d Aspremont. M2 OJME: Optimisation convexe. 66/128
Optimal and locally optimal points x is feasible if x dom f 0 and it satisfies the constraints a feasible x is optimal if f 0 (x) = p ; X opt is the set of optimal points x is locally optimal if there is an R > 0 such that x is optimal for minimize (over z) f 0 (z) subject to f i (z) 0, i = 1,..., m, h i (z) = 0, i = 1,..., p z x 2 R examples (with n = 1, m = p = 0) f 0 (x) = 1/x, dom f 0 = R ++ : p = 0, no optimal point f 0 (x) = log x, dom f 0 = R ++ : p = f 0 (x) = x log x, dom f 0 = R ++ : p = 1/e, x = 1/e is optimal f 0 (x) = x 3 3x, p =, local optimum at x = 1 A. d Aspremont. M2 OJME: Optimisation convexe. 67/128
Implicit constraints the standard form optimization problem has an implicit constraint x D = m i=0 dom f i p i=1 dom h i, we call D the domain of the problem the constraints f i (x) 0, h i (x) = 0 are the explicit constraints a problem is unconstrained if it has no explicit constraints (m = p = 0) example: minimize f 0 (x) = k i=1 log(b i a T i x) is an unconstrained problem with implicit constraints a T i x < b i A. d Aspremont. M2 OJME: Optimisation convexe. 68/128
Feasibility problem find x subject to f i (x) 0, i = 1,..., m h i (x) = 0, i = 1,..., p can be considered a special case of the general problem with f 0 (x) = 0: minimize 0 subject to f i (x) 0, i = 1,..., m h i (x) = 0, i = 1,..., p p = 0 if constraints are feasible; any feasible x is optimal p = if constraints are infeasible A. d Aspremont. M2 OJME: Optimisation convexe. 69/128
Convex optimization problem standard form convex optimization problem minimize f 0 (x) subject to f i (x) 0, i = 1,..., m a T i x = b i, i = 1,..., p f 0, f 1,..., f m are convex; equality constraints are affine problem is quasiconvex if f 0 is quasiconvex (and f 1,..., f m convex) often written as minimize f 0 (x) subject to f i (x) 0, i = 1,..., m Ax = b important property: feasible set of a convex optimization problem is convex A. d Aspremont. M2 OJME: Optimisation convexe. 70/128
example minimize f 0 (x) = x 2 1 + x 2 2 subject to f 1 (x) = x 1 /(1 + x 2 2) 0 h 1 (x) = (x 1 + x 2 ) 2 = 0 f 0 is convex; feasible set {(x 1, x 2 ) x 1 = x 2 0} is convex not a convex problem (according to our definition): f 1 is not convex, h 1 is not affine equivalent (but not identical) to the convex problem minimize x 2 1 + x 2 2 subject to x 1 0 x 1 + x 2 = 0 A. d Aspremont. M2 OJME: Optimisation convexe. 71/128
Local and global optima any locally optimal point of a convex problem is (globally) optimal Proof: suppose x is locally optimal and y is optimal with f 0 (y) < f 0 (x) x locally optimal means there is an R > 0 such that z feasible, z x 2 R = f 0 (z) f 0 (x) consider z = θy + (1 θ)x with θ = R/(2 y x 2 ) y x 2 > R, so 0 < θ < 1/2 z is a convex combination of two feasible points, hence also feasible z x 2 = R/2 and f 0 (z) θf 0 (x) + (1 θ)f 0 (y) < f 0 (x) which contradicts our assumption that x is locally optimal A. d Aspremont. M2 OJME: Optimisation convexe. 72/128
Optimality criterion for differentiable f 0 x is optimal if and only if it is feasible and f 0 (x) T (y x) 0 for all feasible y X x f 0 (x) if nonzero, f 0 (x) defines a supporting hyperplane to feasible set X at x A. d Aspremont. M2 OJME: Optimisation convexe. 73/128
unconstrained problem: x is optimal if and only if x dom f 0, f 0 (x) = 0 equality constrained problem minimize f 0 (x) subject to Ax = b x is optimal if and only if there exists a ν such that x dom f 0, Ax = b, f 0 (x) + A T ν = 0 minimization over nonnegative orthant x is optimal if and only if minimize f 0 (x) subject to x 0 x dom f 0, x 0, { f0 (x) i 0 x i = 0 f 0 (x) i = 0 x i > 0 A. d Aspremont. M2 OJME: Optimisation convexe. 74/128
Equivalent convex problems two problems are (informally) equivalent if the solution of one is readily obtained from the solution of the other, and vice-versa some common transformations that preserve convexity: eliminating equality constraints minimize f 0 (x) subject to f i (x) 0, i = 1,..., m Ax = b is equivalent to minimize (over z) f 0 (F z + x 0 ) subject to f i (F z + x 0 ) 0, i = 1,..., m where F and x 0 are such that Ax = b x = F z + x 0 for some z A. d Aspremont. M2 OJME: Optimisation convexe. 75/128
introducing equality constraints minimize f 0 (A 0 x + b 0 ) subject to f i (A i x + b i ) 0, i = 1,..., m is equivalent to minimize (over x, y i ) f 0 (y 0 ) subject to f i (y i ) 0, i = 1,..., m y i = A i x + b i, i = 0, 1,..., m introducing slack variables for linear inequalities minimize f 0 (x) subject to a T i x b i, i = 1,..., m is equivalent to minimize (over x, s) f 0 (x) subject to a T i x + s i = b i, i = 1,..., m s i 0, i = 1,... m A. d Aspremont. M2 OJME: Optimisation convexe. 76/128
epigraph form: standard form convex problem is equivalent to minimize (over x, t) t subject to f 0 (x) t 0 f i (x) 0, i = 1,..., m Ax = b minimizing over some variables minimize f 0 (x 1, x 2 ) subject to f i (x 1 ) 0, i = 1,..., m is equivalent to minimize f0 (x 1 ) subject to f i (x 1 ) 0, i = 1,..., m where f 0 (x 1 ) = inf x2 f 0 (x 1, x 2 ) A. d Aspremont. M2 OJME: Optimisation convexe. 77/128
Quasiconvex optimization minimize f 0 (x) subject to f i (x) 0, i = 1,..., m Ax = b with f 0 : R n R quasiconvex, f 1,..., f m convex can have locally optimal points that are not (globally) optimal (x,f 0 (x)) A. d Aspremont. M2 OJME: Optimisation convexe. 78/128
quasiconvex optimization via convex feasibility problems f 0 (x) t, f i (x) 0, i = 1,..., m, Ax = b (1) for fixed t, a convex feasibility problem in x if feasible, we can conclude that t p ; if infeasible, t p Bisection method for quasiconvex optimization given l p, u p, tolerance ɛ > 0. repeat 1. t := (l + u)/2. 2. Solve the convex feasibility problem (1). 3. if (1) is feasible, u := t; else l := t. until u l ɛ. requires exactly log 2 ((u l)/ɛ) iterations (where u, l are initial values) A. d Aspremont. M2 OJME: Optimisation convexe. 79/128
Linear program (LP) minimize subject to c T x + d Gx h Ax = b convex problem with affine objective and constraint functions feasible set is a polyhedron c P x A. d Aspremont. M2 OJME: Optimisation convexe. 80/128
Chebyshev center of a polyhedron Chebyshev center of P = {x a T i x b i, i = 1,..., m} is center of largest inscribed ball x cheb B = {x c + u u 2 r} a T i x b i for all x B if and only if sup{a T i (x c + u) u 2 r} = a T i x c + r a i 2 b i hence, x c, r can be determined by solving the LP maximize r subject to a T i x c + r a i 2 b i, i = 1,..., m A. d Aspremont. M2 OJME: Optimisation convexe. 81/128
(Generalized) linear-fractional program minimize subject to f 0 (x) Gx h Ax = b linear-fractional program f 0 (x) = ct x + d e T x + f, dom f 0(x) = {x e T x + f > 0} a quasiconvex optimization problem; can be solved by bisection also equivalent to the LP (variables y, z) minimize subject to c T y + dz Gy hz Ay = bz e T y + fz = 1 z 0 A. d Aspremont. M2 OJME: Optimisation convexe. 82/128
Quadratic program (QP) minimize subject to (1/2)x T P x + q T x + r Gx h Ax = b P S n +, so objective is convex quadratic minimize a convex quadratic function over a polyhedron f 0 (x ) x P A. d Aspremont. M2 OJME: Optimisation convexe. 83/128
Examples least-squares minimize Ax b 2 2 analytical solution x = A b (A is pseudo-inverse) can add linear constraints, e.g., l x u linear program with random cost minimize c T x + γx T Σx = E c T x + γ var(c T x) subject to Gx h, Ax = b c is random vector with mean c and covariance Σ hence, c T x is random variable with mean c T x and variance x T Σx γ > 0 is risk aversion parameter; controls the trade-off between expected cost and variance (risk) A. d Aspremont. M2 OJME: Optimisation convexe. 84/128
Quadratically constrained quadratic program (QCQP) minimize (1/2)x T P 0 x + q0 T x + r 0 subject to (1/2)x T P i x + qi T x + r i 0, i = 1,..., m Ax = b P i S n +; objective and constraints are convex quadratic if P 1,..., P m S n ++, feasible region is intersection of m ellipsoids and an affine set A. d Aspremont. M2 OJME: Optimisation convexe. 85/128
Second-order cone programming (A i R n i n, F R p n ) minimize f T x subject to A i x + b i 2 c T i x + d i, i = 1,..., m F x = g inequalities are called second-order cone (SOC) constraints: (A i x + b i, c T i x + d i ) second-order cone in R n i+1 for n i = 0, reduces to an LP; if c i = 0, reduces to a QCQP more general than QCQP and LP A. d Aspremont. M2 OJME: Optimisation convexe. 86/128
Robust linear programming the parameters in optimization problems are often uncertain, e.g., in an LP minimize c T x subject to a T i x b i, i = 1,..., m, there can be uncertainty in c, a i, b i two common approaches to handling uncertainty (in a i, for simplicity) deterministic model: constraints must hold for all a i E i minimize c T x subject to a T i x b i for all a i E i, i = 1,..., m, stochastic model: a i is random variable; constraints must hold with probability η minimize c T x subject to Prob(a T i x b i) η, i = 1,..., m A. d Aspremont. M2 OJME: Optimisation convexe. 87/128
deterministic approach via SOCP choose an ellipsoid as E i : E i = {ā i + P i u u 2 1} (ā i R n, P i R n n ) center is ā i, semi-axes determined by singular values/vectors of P i robust LP minimize c T x subject to a T i x b i a i E i, i = 1,..., m is equivalent to the SOCP minimize c T x subject to ā T i x + P i T x 2 b i, i = 1,..., m (follows from sup u 2 1(ā i + P i u) T x = ā T i x + P i T x 2) A. d Aspremont. M2 OJME: Optimisation convexe. 88/128
stochastic approach via SOCP assume a i is Gaussian with mean ā i, covariance Σ i (a i N (ā i, Σ i )) a T i x is Gaussian r.v. with mean āt i x, variance xt Σ i x; hence ( ) Prob(a T b i ā T i i x b i ) = Φ x Σ 1/2 i x 2 where Φ(x) = (1/ 2π) x e t2 /2 dt is CDF of N (0, 1) robust LP minimize c T x subject to Prob(a T i x b i) η, i = 1,..., m, with η 1/2, is equivalent to the SOCP minimize c T x subject to ā T i x + Φ 1 (η) Σ 1/2 i x 2 b i, i = 1,..., m A. d Aspremont. M2 OJME: Optimisation convexe. 89/128
Impact of reliability {x Prob(a T i x b i ) η, i = 1,..., m} η = 10% η = 50% η = 90% A. d Aspremont. M2 OJME: Optimisation convexe. 90/128
Generalized inequality constraints convex problem with generalized inequality constraints minimize f 0 (x) subject to f i (x) Ki 0, i = 1,..., m Ax = b f 0 : R n R convex; f i : R n R k i K i -convex w.r.t. proper cone K i same properties as standard convex problem (convex feasible set, local optimum is global, etc.) conic form problem: special case with affine objective and constraints minimize c T x subject to F x + g K 0 Ax = b extends linear programming (K = R m +) to nonpolyhedral cones A. d Aspremont. M2 OJME: Optimisation convexe. 91/128
Semidefinite program (SDP) with F i, G S k minimize c T x subject to x 1 F 1 + x 2 F 2 + + x n F n + G 0 Ax = b inequality constraint is called linear matrix inequality (LMI) includes problems with multiple LMI constraints: for example, x 1 ˆF1 + + x n ˆFn + Ĝ 0, x 1 F 1 + + x n Fn + G 0 is equivalent to single LMI x 1 [ ˆF1 0 0 F1 ] + x 2 [ ˆF2 0 0 F2 ] + + x n [ ˆFn 0 0 Fn ] + [ Ĝ 0 0 G ] 0 A. d Aspremont. M2 OJME: Optimisation convexe. 92/128
LP and SOCP as SDP LP and equivalent SDP LP: minimize c T x subject to Ax b SDP: minimize c T x subject to diag(ax b) 0 (note different interpretation of generalized inequality ) SOCP and equivalent SDP SOCP: minimize f T x subject to A i x + b i 2 c T i x + d i, i = 1,..., m SDP: minimize f [ T x (c T subject to i x + d i )I A i x + b i (A i x + b i ) T c T i x + d i ] 0, i = 1,..., m A. d Aspremont. M2 OJME: Optimisation convexe. 93/128
Eigenvalue minimization minimize λ max (A(x)) where A(x) = A 0 + x 1 A 1 + + x n A n (with given A i S k ) equivalent SDP minimize subject to t A(x) ti variables x R n, t R follows from λ max (A) t A ti A. d Aspremont. M2 OJME: Optimisation convexe. 94/128
Matrix norm minimization minimize A(x) 2 = ( λ max (A(x) T A(x)) ) 1/2 where A(x) = A 0 + x 1 A 1 + + x n A n (with given A i S p q ) equivalent SDP minimize subject to t[ ti A(x) T A(x) ti ] 0 variables x R n, t R constraint follows from A 2 t A T A t 2 I, t 0 [ ] ti A A T 0 ti A. d Aspremont. M2 OJME: Optimisation convexe. 95/128
Duality A. d Aspremont. M2 OJME: Optimisation convexe. 96/128
Outline Lagrange dual problem weak and strong duality optimality conditions perturbation and sensitivity analysis examples generalized inequalities A. d Aspremont. M2 OJME: Optimisation convexe. 97/128
Lagrangian standard form problem (not necessarily convex) minimize f 0 (x) subject to f i (x) 0, i = 1,..., m h i (x) = 0, i = 1,..., p variable x R n, domain D, optimal value p Lagrangian: L : R n R m R p R, with dom L = D R m R p, L(x, λ, ν) = f 0 (x) + m λ i f i (x) + i=1 p ν i h i (x) i=1 weighted sum of objective and constraint functions λ i is Lagrange multiplier associated with f i (x) 0 ν i is Lagrange multiplier associated with h i (x) = 0 A. d Aspremont. M2 OJME: Optimisation convexe. 98/128
Lagrange dual function Lagrange dual function: g : R m R p R, g(λ, ν) = inf x D = inf x D L(x, λ, ν) ( f 0 (x) + g is concave, can be for some λ, ν m λ i f i (x) + i=1 ) p ν i h i (x) i=1 lower bound property: if λ 0, then g(λ, ν) p proof: if x is feasible and λ 0, then f 0 ( x) L( x, λ, ν) inf L(x, λ, ν) = g(λ, ν) x D minimizing over all feasible x gives p g(λ, ν) A. d Aspremont. M2 OJME: Optimisation convexe. 99/128
Least-norm solution of linear equations minimize x T x subject to Ax = b dual function Lagrangian is L(x, ν) = x T x + ν T (Ax b) to minimize L over x, set gradient equal to zero: x L(x, ν) = 2x + A T ν = 0 = x = (1/2)A T ν plug in in L to obtain g: a concave function of ν g(ν) = L(( 1/2)A T ν, ν) = 1 4 νt AA T ν b T ν lower bound property: p (1/4)ν T AA T ν b T ν for all ν A. d Aspremont. M2 OJME: Optimisation convexe. 100/128
Standard form LP minimize c T x subject to Ax = b, x 0 dual function Lagrangian is L(x, λ, ν) = c T x + ν T (Ax b) λ T x = b T ν + (c + A T ν λ) T x L is linear in x, hence g(λ, ν) = inf x L(x, λ, ν) = { b T ν A T ν λ + c = 0 otherwise g is linear on affine domain {(λ, ν) A T ν λ + c = 0}, hence concave lower bound property: p b T ν if A T ν + c 0 A. d Aspremont. M2 OJME: Optimisation convexe. 101/128
Equality constrained norm minimization dual function minimize subject to g(ν) = inf x ( x νt Ax + b T ν) = x Ax = b where v = sup u 1 u T v is dual norm of { b T ν A T ν 1 otherwise proof: follows from inf x ( x y T x) = 0 if y 1, otherwise if y 1, then x y T x 0 for all x, with equality if x = 0 if y > 1, choose x = tu where u 1, u T y = y > 1: x y T x = t( u y ) as t lower bound property: p b T ν if A T ν 1 A. d Aspremont. M2 OJME: Optimisation convexe. 102/128
Two-way partitioning minimize x T W x subject to x 2 i = 1, i = 1,..., n a nonconvex problem; feasible set contains 2 n discrete points interpretation: partition {1,..., n} in two sets; W ij is cost of assigning i, j to the same set; W ij is cost of assigning to different sets dual function g(ν) = inf x (xt W x + i ν i (x 2 i 1)) = inf x xt (W + diag(ν))x 1 T ν { 1 = T ν W + diag(ν) 0 otherwise lower bound property: p 1 T ν if W + diag(ν) 0 example: ν = λ min (W )1 gives bound p nλ min (W ) A. d Aspremont. M2 OJME: Optimisation convexe. 103/128
The dual problem Lagrange dual problem maximize g(λ, ν) subject to λ 0 finds best lower bound on p, obtained from Lagrange dual function a convex optimization problem; optimal value denoted d λ, ν are dual feasible if λ 0, (λ, ν) dom g often simplified by making implicit constraint (λ, ν) dom g explicit example: standard form LP and its dual (page 101) minimize subject to c T x Ax = b x 0 maximize b T ν subject to A T ν + c 0 A. d Aspremont. M2 OJME: Optimisation convexe. 104/128
Weak and strong duality weak duality: d p always holds (for convex and nonconvex problems) can be used to find nontrivial lower bounds for difficult problems for example, solving the SDP maximize 1 T ν subject to W + diag(ν) 0 gives a lower bound for the two-way partitioning problem on page 103 strong duality: d = p does not hold in general (usually) holds for convex problems conditions that guarantee strong duality in convex problems are called constraint qualifications A. d Aspremont. M2 OJME: Optimisation convexe. 105/128
Slater s constraint qualification strong duality holds for a convex problem if it is strictly feasible, i.e., minimize f 0 (x) subject to f i (x) 0, i = 1,..., m Ax = b x int D : f i (x) < 0, i = 1,..., m, Ax = b also guarantees that the dual optimum is attained (if p > ) can be sharpened: e.g., can replace int D with relint D (interior relative to affine hull); linear inequalities do not need to hold with strict inequality,... there exist many other types of constraint qualifications A. d Aspremont. M2 OJME: Optimisation convexe. 106/128
Feasibility problems feasibility problem A in x R n. f i (x) < 0, i = 1,..., m, h i (x) = 0, i = 1,..., p feasibility problem B in λ R m, ν R p. λ 0, λ 0, g(λ, ν) 0 where g(λ, ν) = inf x ( m i=1 λ if i (x) + p i=1 ν ih i (x)) feasibility problem B is convex (g is concave), even if problem A is not A and B are always weak alternatives: at most one is feasible proof: assume x satisfies A, λ, ν satisfy B 0 g(λ, ν) m i=1 λ if i ( x) + p i=1 ν ih i ( x) < 0 A and B are strong alternatives if exactly one of the two is feasible (can prove infeasibility of A by producing solution of B and vice-versa). A. d Aspremont. M2 OJME: Optimisation convexe. 107/128
Inequality form LP primal problem minimize subject to c T x Ax b dual function g(λ) = inf x ( (c + A T λ) T x b T λ ) { b = T λ A T λ + c = 0 otherwise dual problem maximize b T λ subject to A T λ + c = 0, λ 0 from Slater s condition: p = d if A x b for some x in fact, p = d except when primal and dual are infeasible A. d Aspremont. M2 OJME: Optimisation convexe. 108/128
Quadratic program primal problem (assume P S n ++) minimize subject to x T P x Ax b dual function g(λ) = inf x ( x T P x + λ T (Ax b) ) = 1 4 λt AP 1 A T λ b T λ dual problem maximize (1/4)λ T AP 1 A T λ b T λ subject to λ 0 from Slater s condition: p = d if A x b for some x in fact, p = d always A. d Aspremont. M2 OJME: Optimisation convexe. 109/128
A nonconvex problem with strong duality minimize x T Ax + 2b T x subject to x T x 1 nonconvex if A 0 dual function: g(λ) = inf x (x T (A + λi)x + 2b T x λ) unbounded below if A + λi 0 or if A + λi 0 and b R(A + λi) minimized by x = (A + λi) b otherwise: g(λ) = b T (A + λi) b λ dual problem and equivalent SDP: maximize b T (A + λi) b λ subject to A + λi 0 b R(A + λi) maximize subject to t [ λ A + λi b b T t ] 0 strong duality although primal problem is not convex (more later) A. d Aspremont. M2 OJME: Optimisation convexe. 110/128
Complementary slackness Assume strong duality holds, x is primal optimal, (λ, ν ) is dual optimal f 0 (x ) = g(λ, ν ) = inf x ( f 0 (x ) + f 0 (x ) f 0 (x) + m λ i f i (x) + i=1 m λ i f i (x ) + i=1 ) p νi h i (x) i=1 p νi h i (x ) i=1 hence, the two inequalities hold with equality x minimizes L(x, λ, ν ) λ i f i(x ) = 0 for i = 1,..., m (known as complementary slackness): λ i > 0 = f i (x ) = 0, f i (x ) < 0 = λ i = 0 A. d Aspremont. M2 OJME: Optimisation convexe. 111/128
Karush-Kuhn-Tucker (KKT) conditions the following four conditions are called KKT conditions (for a problem with differentiable f i, h i ): 1. Primal feasibility: f i (x) 0, i = 1,..., m, h i (x) = 0, i = 1,..., p 2. Dual feasibility: λ 0 3. Complementary slackness: λ i f i (x) = 0, i = 1,..., m 4. Gradient of Lagrangian with respect to x vanishes (first order condition): f 0 (x) + m λ i f i (x) + i=1 p ν i h i (x) = 0 i=1 If strong duality holds and x, λ, ν are optimal, then they must satisfy the KKT conditions A. d Aspremont. M2 OJME: Optimisation convexe. 112/128
KKT conditions for convex problem If x, λ, ν satisfy KKT for a convex problem, then they are optimal: from complementary slackness: f 0 ( x) = L( x, λ, ν) from 4th condition (and convexity): g( λ, ν) = L( x, λ, ν) hence, f 0 ( x) = g( λ, ν) If Slater s condition is satisfied, x is optimal if and only if there exist λ, ν that satisfy KKT conditions recall that Slater implies strong duality, and dual optimum is attained generalizes optimality condition f 0 (x) = 0 for unconstrained problem Summary: When strong duality holds, the KKT conditions are necessary conditions for optimality If the problem is convex, they are also sufficient A. d Aspremont. M2 OJME: Optimisation convexe. 113/128
example: water-filling (assume α i > 0) minimize n i=1 log(x i + α i ) subject to x 0, 1 T x = 1 x is optimal iff x 0, 1 T x = 1, and there exist λ R n, ν R such that λ 0, λ i x i = 0, 1 x i + α i + λ i = ν if ν < 1/α i : λ i = 0 and x i = 1/ν α i if ν 1/α i : λ i = ν 1/α i and x i = 0 determine ν from 1 T x = n i=1 max{0, 1/ν α i} = 1 interpretation n patches; level of patch i is at height α i flood area with unit amount of water 1/ν x i α i resulting level is 1/ν i A. d Aspremont. M2 OJME: Optimisation convexe. 114/128
Duality and problem reformulations equivalent formulations of a problem can lead to very different duals reformulating the primal problem can be useful when the dual is difficult to derive, or uninteresting common reformulations introduce new variables and equality constraints make explicit constraints implicit or vice-versa transform objective or constraint functions e.g., replace f 0 (x) by φ(f 0 (x)) with φ convex, increasing A. d Aspremont. M2 OJME: Optimisation convexe. 115/128
Introducing new variables and equality constraints minimize f 0 (Ax + b) dual function is constant: g = inf x L(x) = inf x f 0 (Ax + b) = p we have strong duality, but dual is quite useless reformulated problem and its dual minimize f 0 (y) subject to Ax + b y = 0 maximize b T ν f0 (ν) subject to A T ν = 0 dual function follows from g(ν) = inf (f 0(y) ν T y + ν T Ax + b T ν) x,y { f = 0 (ν) + b T ν A T ν = 0 otherwise A. d Aspremont. M2 OJME: Optimisation convexe. 116/128
norm approximation problem: minimize Ax b minimize subject to y y = Ax b can look up conjugate of, or derive dual directly g(ν) = inf x,y νt y ν T Ax + b T ν) { b = ν + inf y ( y + ν T y) A T ν = 0 otherwise { b = ν A T ν = 0, ν 1 otherwise (see page 100) dual of norm approximation problem maximize b T ν subject to A T ν = 0, ν 1 A. d Aspremont. M2 OJME: Optimisation convexe. 117/128
Implicit constraints LP with box constraints: primal and dual problem minimize subject to c T x Ax = b 1 x 1 maximize b T ν 1 T λ 1 1 T λ 2 subject to c + A T ν + λ 1 λ 2 = 0 λ 1 0, λ 2 0 reformulation with box constraints made implicit minimize f 0 (x) = subject to Ax = b { c T x 1 x 1 otherwise dual function g(ν) = inf 1 x 1 (ct x + ν T (Ax b)) = b T ν A T ν + c 1 dual problem: maximize b T ν A T ν + c 1 A. d Aspremont. M2 OJME: Optimisation convexe. 118/128
Problems with generalized inequalities Ki is generalized inequality on R k i definitions are parallel to scalar case: minimize f 0 (x) subject to f i (x) Ki 0, i = 1,..., m h i (x) = 0, i = 1,..., p Lagrange multiplier for f i (x) Ki 0 is vector λ i R k i Lagrangian L : R n R k 1 R k m R p R, is defined as L(x, λ 1,, λ m, ν) = f 0 (x) + m λ T i f i (x) + i=1 p ν i h i (x) i=1 dual function g : R k 1 R k m R p R, is defined as g(λ 1,..., λ m, ν) = inf x D L(x, λ 1,, λ m, ν) A. d Aspremont. M2 OJME: Optimisation convexe. 119/128
lower bound property: if λ i K i 0, then g(λ 1,..., λ m, ν) p proof: if x is feasible and λ K i 0, then f 0 ( x) f 0 ( x) + m λ T i f i ( x) + i=1 inf L(x, λ 1,..., λ m, ν) x D = g(λ 1,..., λ m, ν) p ν i h i ( x) i=1 minimizing over all feasible x gives p g(λ 1,..., λ m, ν) dual problem maximize g(λ 1,..., λ m, ν) subject to λ i K i 0, i = 1,..., m weak duality: p d always strong duality: p = d for convex problem with constraint qualification (for example, Slater s: primal problem is strictly feasible) A. d Aspremont. M2 OJME: Optimisation convexe. 120/128
Semidefinite program primal SDP (F i, G S k ) minimize subject to c T x x 1 F 1 + + x n F n G Lagrange multiplier is matrix Z S k Lagrangian L(x, Z) = c T x + Tr (Z(x 1 F 1 + + x n F n G)) dual function g(z) = inf x L(x, Z) = { Tr(GZ) Tr(Fi Z) + c i = 0, i = 1,..., n otherwise dual SDP maximize Tr(GZ) subject to Z 0, Tr(F i Z) + c i = 0, i = 1,..., n p = d if primal SDP is strictly feasible ( x with x 1 F 1 + + x n F n G) A. d Aspremont. M2 OJME: Optimisation convexe. 121/128
Duality: SOCP example Let s consider the following Second Order Cone Program (SOCP): minimize f T x subject to A i x + b i 2 c T i x + d i, i = 1,..., m, with variable x R n. Let s show that the dual can be expressed as maximize subject to m i=1 (bt i u i + d i v i ) m i=1 (AT i u i + c i v i ) + f = 0 u i 2 v i, i = 1,..., m, with variables u i R n i, v i R, i = 1,..., m and problem data given by f R n, A i R n i n, b i R n i, c i R and d i R. A. d Aspremont. M2 OJME: Optimisation convexe. 122/128
Duality: SOCP We can derive the dual in the following two ways: 1. Introduce new variables y i R n i and t i R and equalities y i = A i x + b i, t i = c T i x + d i, and derive the Lagrange dual. 2. Start from the conic formulation of the SOCP and use the conic dual. Use the fact that the second-order cone is self-dual: t x tv + x T y 0, for all v, y such that v y The condition x T y tv is a simple Cauchy-Schwarz inequality A. d Aspremont. M2 OJME: Optimisation convexe. 123/128
Duality: SOCP We introduce new variables, and write the problem as minimize c T x subject to y i 2 t i, i = 1,..., m y i = A i x + b i, t i = c T i x + d i, i = 1,..., m The Lagrangian is L(x, y, t, λ, ν, µ) m m m = c T x + λ i ( y i 2 t i ) + νi T (y i A i x b i ) + µ i (t i c T i x d i ) i=1 m = (c A T i ν i i=1 n (b T i ν i + d i µ i ). i=1 i=1 m µ i c i ) T x + i=1 m (λ i y i 2 + νi T y i ) + i=1 i=1 i=1 m ( λ i + µ i )t i A. d Aspremont. M2 OJME: Optimisation convexe. 124/128