Optimisation convexe: performance, complexité et applications.

Similar documents
3. Convex functions. basic properties and examples. operations that preserve convexity. the conjugate function. quasiconvex functions

Convex Optimization M2

4. Convex optimization problems

5. Duality. Lagrangian

CSCI : Optimization and Control of Networks. Review on Convex Optimization

A Brief Review on Convex Optimization

Lecture: Convex Optimization Problems

Convex optimization problems. Optimization problem in standard form

Convex Optimization Boyd & Vandenberghe. 5. Duality

3. Convex functions. basic properties and examples. operations that preserve convexity. the conjugate function. quasiconvex functions

Lecture: Duality.

4. Convex optimization problems

1. Introduction. mathematical optimization. least-squares and linear programming. convex optimization. example. course goals and topics

1. Introduction. mathematical optimization. least-squares and linear programming. convex optimization. example. course goals and topics

Lecture: Duality of LP, SOCP and SDP

Convex Optimization in Communications and Signal Processing

EE/AA 578, Univ of Washington, Fall Duality

Lecture 2: Convex functions

Constrained Optimization and Lagrangian Duality

Convex functions. Definition. f : R n! R is convex if dom f is a convex set and. f ( x +(1 )y) < f (x)+(1 )f (y) f ( x +(1 )y) apple f (x)+(1 )f (y)

Tutorial on Convex Optimization for Engineers Part II

Extreme Abridgment of Boyd and Vandenberghe s Convex Optimization

12. Interior-point methods

Lagrange Duality. Daniel P. Palomar. Hong Kong University of Science and Technology (HKUST)

Lecture 6: Conic Optimization September 8

Convex Optimization & Lagrange Duality

Convex Functions. Daniel P. Palomar. Hong Kong University of Science and Technology (HKUST)

Lecture Note 5: Semidefinite Programming for Stability Analysis

CS-E4830 Kernel Methods in Machine Learning

ELE539A: Optimization of Communication Systems Lecture 15: Semidefinite Programming, Detection and Estimation Applications

Geometric problems. Chapter Projection on a set. The distance of a point x 0 R n to a closed set C R n, in the norm, is defined as

I.3. LMI DUALITY. Didier HENRION EECI Graduate School on Control Supélec - Spring 2010

subject to (x 2)(x 4) u,

4. Convex optimization problems (part 1: general)

EE/ACM Applications of Convex Optimization in Signal Processing and Communications Lecture 17

A : k n. Usually k > n otherwise easily the minimum is zero. Analytical solution:

Convex Optimization. Newton s method. ENSAE: Optimisation 1/44

A : k n. Usually k > n otherwise easily the minimum is zero. Analytical solution:

1. Introduction. mathematical optimization. least-squares and linear programming. convex optimization. example. course goals and topics

1. Introduction. mathematical optimization. least-squares and linear programming. convex optimization. example. course goals and topics

1. Introduction. mathematical optimization. least-squares and linear programming. convex optimization. example. course goals and topics

1. Gradient method. gradient method, first-order methods. quadratic bounds on convex functions. analysis of gradient method

Duality. Lagrange dual problem weak and strong duality optimality conditions perturbation and sensitivity analysis generalized inequalities

12. Interior-point methods

Subgradient. Acknowledgement: this slides is based on Prof. Lieven Vandenberghes lecture notes. definition. subgradient calculus

Convex Functions. Pontus Giselsson

ICS-E4030 Kernel Methods in Machine Learning

Convex Optimization and Modeling

Shiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 4. Subgradient

Convex Optimization M2

Convex Optimization and l 1 -minimization

The proximal mapping

Lecture: Examples of LP, SOCP and SDP

A function(al) f is convex if dom f is a convex set, and. f(θx + (1 θ)y) < θf(x) + (1 θ)f(y) f(x) = x 3

Subgradients. subgradients and quasigradients. subgradient calculus. optimality conditions via subgradients. directional derivatives

Advances in Convex Optimization: Theory, Algorithms, and Applications

Convex Optimization and Modeling

Subgradients. subgradients. strong and weak subgradient calculus. optimality conditions via subgradients. directional derivatives

Motivation. Lecture 2 Topics from Optimization and Duality. network utility maximization (NUM) problem:

Convex Optimization. Dani Yogatama. School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA. February 12, 2014

15. Conic optimization

Optimization for Machine Learning

Lagrangian Duality and Convex Optimization

CS675: Convex and Combinatorial Optimization Fall 2016 Convex Optimization Problems. Instructor: Shaddin Dughmi

8. Geometric problems

Convex Functions. Wing-Kin (Ken) Ma The Chinese University of Hong Kong (CUHK)

14. Duality. ˆ Upper and lower bounds. ˆ General duality. ˆ Constraint qualifications. ˆ Counterexample. ˆ Complementary slackness.

Lecture 9 Sequential unconstrained minimization

Lecture 7: Convex Optimizations

Convex Optimization. (EE227A: UC Berkeley) Lecture 6. Suvrit Sra. (Conic optimization) 07 Feb, 2013

Semidefinite Programming Basics and Applications

Lecture 15 Newton Method and Self-Concordance. October 23, 2008

Linear and non-linear programming

Convex Optimization for Signal Processing and Communications: From Fundamentals to Applications

Interior Point Algorithms for Constrained Convex Optimization

Solution to EE 617 Mid-Term Exam, Fall November 2, 2017

minimize x subject to (x 2)(x 4) u,

Exercises. Exercises. Basic terminology and optimality conditions. 4.2 Consider the optimization problem

minimize x x2 2 x 1x 2 x 1 subject to x 1 +2x 2 u 1 x 1 4x 2 u 2, 5x 1 +76x 2 1,

HW1 solutions. 1. α Ef(x) β, where Ef(x) is the expected value of f(x), i.e., Ef(x) = n. i=1 p if(a i ). (The function f : R R is given.

EE 546, Univ of Washington, Spring Proximal mapping. introduction. review of conjugate functions. proximal mapping. Proximal mapping 6 1

Primal/Dual Decomposition Methods

LECTURE 25: REVIEW/EPILOGUE LECTURE OUTLINE

EE 227A: Convex Optimization and Applications October 14, 2008

8. Geometric problems

Convex Optimization. 4. Convex Optimization Problems. Prof. Ying Cui. Department of Electrical Engineering Shanghai Jiao Tong University

Newton s Method. Javier Peña Convex Optimization /36-725

Semidefinite and Second Order Cone Programming Seminar Fall 2012 Project: Robust Optimization and its Application of Robust Portfolio Optimization

Interior Point Methods: Second-Order Cone Programming and Semidefinite Programming

Nonlinear Programming Models

Lecture 18: Optimization Programming

Homework Set #6 - Solutions

CS295: Convex Optimization. Xiaohui Xie Department of Computer Science University of California, Irvine

Introduction to Machine Learning Lecture 7. Mehryar Mohri Courant Institute and Google Research

Lecture 3. Optimization Problems and Iterative Algorithms

10. Unconstrained minimization

Tutorial on Convex Optimization: Part II

UNDERGROUND LECTURE NOTES 1: Optimality Conditions for Constrained Optimization Problems

CONVEX OPTIMIZATION, DUALITY, AND THEIR APPLICATION TO SUPPORT VECTOR MACHINES. Contents 1. Introduction 1 2. Convex Sets

Optimality Conditions for Constrained Optimization

Transcription:

Optimisation convexe: performance, complexité et applications. Introduction, convexité, dualité. A. d Aspremont. M2 OJME: Optimisation convexe. 1/128

Today Convex optimization: introduction Course organization and other gory details... Convex optimization: basic concepts A. d Aspremont. M2 OJME: Optimisation convexe. 2/128

Convex Optimization A. d Aspremont. M2 OJME: Optimisation convexe. 3/128

Convex optimization minimize f 0 (x) subject to f 1 (x) 0,..., f m (x) 0 x R n is optimization variable; f i : R n R are convex: f i (λx + (1 λ)y) λf i (x) + (1 λ)f i (y) for all x, y, 0 λ 1 This template includes LS, LP, QP, and many others. Good news: convex problems (LP, QP, etc) are fundamentally tractable. Bad news: this is an exception, most nonconvex are completely intractable. A. d Aspremont. M2 OJME: Optimisation convexe. 4/128

Convex optimization A brief history... The field is about 50 years old. Starts with the work of Von Neumann, Kuhn and Tucker, etc. Explodes in the 60 s with the advent of relatively cheap and efficient computers... Key to all this: fast linear algebra Some of the theory developed before computers even existed... A. d Aspremont. M2 OJME: Optimisation convexe. 5/128

Convex optimization: history Historical view: nonlinear problems are hard, linear ones are easy. In reality: Convexity = low complexity... In fact the great watershed in optimization isn t between linearity and nonlinearity, but convexity and nonconvexity. T. Rockafellar. True: Nemirovskii and Yudin [1979]. Very true: Karmarkar [1984]. Seriously true: convex programming, Nesterov and Nemirovskii [1994]. A. d Aspremont. M2 OJME: Optimisation convexe. 6/128

Convexity, complexity All convex minimization problems with: a first order oracle (returning f(x) and a subgradient) can be solved in polynomial time in size and number of precision digits. Proved using the ellipsoid method by Nemirovskii and Yudin [1979]. Very slow convergence in practice. A. d Aspremont. M2 OJME: Optimisation convexe. 7/128

Linear Programming Simplex algorithm by Dantzig (1949): exponential worst-case complexity, very efficient in most cases. Khachiyan [1979] then used the ellipsoid method to show the polynomial complexity of LP. Karmarkar [1984] describes the first efficient polynomial time algorithm for LP, using interior point methods. A. d Aspremont. M2 OJME: Optimisation convexe. 8/128

From LP to structured convex programs Nesterov and Nemirovskii [1994] show that the interior point methods (IPM) used for LPs can be applied to a larger class of structured convex problems. The self-concordance analysis that they introduce extends the polynomial time complexity proof for LPs. Most operations that preserve convexity also preserve self-concordance. A. d Aspremont. M2 OJME: Optimisation convexe. 9/128

Large-scale convex programs Interior point methods. IPM essentially solved once and for all a broad range of medium-scale convex programs. For large-scale problems, computing a single Newton step is often too expensive First order methods. Dependence on precision is polynomial O(1/ɛ α ), not logarithmic O(log(1/ɛ)). This is OK in many applications (stats, etc). Run a much larger number of cheaper iterations. No Hessian means significantly lower memory and CPU costs per iteration. No unified analysis (self-concordance for IPM): large library of disparate methods. Algorithmic choices strictly constrained by problem structure. Objective: classify these techniques, study their performance & complexity. A. d Aspremont. M2 OJME: Optimisation convexe. 10/128

Symmetric cone programs An important particular case: linear programming on symmetric cones minimize subject to c T x Ax b K These include the LP, second-order (Lorentz) and semidefinite cone: LP: {x R n : x 0} Second order: {(x, y) R n R : x y} Semidefinite: {X S n : X 0} Broad class of problems can be represented in this way. Good news: Fast, reliable, open-source solvers available (SDPT3, CVX, etc). This course will describe some exotic applications of these programs. A. d Aspremont. M2 OJME: Optimisation convexe. 11/128

A few miracles Beyond convexity... Hidden convexity. Convex programs solving nonconvex problems (S-lemma). Approximation results. Approximating combinatorial problems by convex programs. Approximate S-lemma. Approximation ratio for MaxCut, etc. Recovery results on l 1 penalties. Finding sparse solutions to optimization problems using convex penalties. Sparse signal reconstruction. Matrix completion (collaborative filtering, NETFLIX, etc.). A. d Aspremont. M2 OJME: Optimisation convexe. 12/128

Course Organization A. d Aspremont. M2 OJME: Optimisation convexe. 13/128

Course outline Fundamental definitions A brief primer on convexity and duality theory Algorithmic complexity Interior point methods, self-concordance. First order algorithms: complexity and classification. Modern applications Statistics Geometrical problems, graphs. Some miracles : approximation, asymptotic and hidden convexity results Measure concentration results. S-lemma, MaxCut, low rank SDP solutions, nonconvex QCQP, etc. High dimensional geometry l 1 recovery, matrix completion, convex deconvolution, etc. A. d Aspremont. M2 OJME: Optimisation convexe. 14/128

Info Course website with lecture notes, homework, etc. http://www.cmap.polytechnique.fr/~aspremon/ojmem2.html A final project. A. d Aspremont. M2 OJME: Optimisation convexe. 15/128

Short blurb Contact info on http://www.cmap.polytechnique.fr/~aspremon Email: alexandre.daspremont@m4x.org Dual PhDs: Ecole Polytechnique & Stanford University Interests: Optimization, machine learning, statistics & finance. Recruiting PhD students... Full fellowship. A. d Aspremont. M2 OJME: Optimisation convexe. 16/128

References All lecture notes will be posted online, none of the books below are required. Nesterov [2003], Introductory Lectures on Convex Optimization, Springer. Convex Optimization by Lieven Vandenberghe and Stephen Boyd, available online at: http://www.stanford.edu/~boyd/cvxbook/ See also Ben-Tal and Nemirovski [2001], Lectures On Modern Convex Optimization: Analysis, Algorithms, And Engineering Applications, SIAM. http://www2.isye.gatech.edu/~nemirovs/ Nesterov and Nemirovskii [1994], Interior Point Polynomial Algorithms in Convex Programming, SIAM. A. d Aspremont. M2 OJME: Optimisation convexe. 17/128

Convex Sets A. d Aspremont. M2 OJME: Optimisation convexe. 18/128

Convex Sets affine and convex sets some important examples operations that preserve convexity generalized inequalities separating and supporting hyperplanes dual cones and generalized inequalities A. d Aspremont. M2 OJME: Optimisation convexe. 19/128

Convex set line segment between x 1 and x 2 : all points with 0 θ 1 x = θx 1 + (1 θ)x 2 convex set: contains line segment between any two points in the set x 1, x 2 C, 0 θ 1 = θx 1 + (1 θ)x 2 C examples (one convex, two nonconvex sets) A. d Aspremont. M2 OJME: Optimisation convexe. 20/128

Convex combination and convex hull convex combination of x 1,..., x k : any point x of the form with θ 1 + + θ k = 1, θ i 0 x = θ 1 x 1 + θ 2 x 2 + + θ k x k convex hull CoS: set of all convex combinations of points in S A. d Aspremont. M2 OJME: Optimisation convexe. 21/128

Hyperplanes and halfspaces hyperplane: set of the form {x a T x = b} (a 0) a x 0 x a T x = b halfspace: set of the form {x a T x b} (a 0) a x 0 a T x b a T x b a is the normal vector hyperplanes are affine and convex; halfspaces are convex A. d Aspremont. M2 OJME: Optimisation convexe. 22/128

Euclidean balls and ellipsoids (Euclidean) ball with center x c and radius r: B(x c, r) = {x x x c 2 r} = {x c + ru u 2 1} Ellipsoid: set of the form {x (x x c ) T P 1 (x x c ) 1} with P S n ++ (i.e., P symmetric positive definite) x c other representation: {x c + Au u 2 1}, with A square and nonsingular. Representation impacts problem formulation & complexity. Idem for polytopes, with polynomial number of vertices, exponential number of facets, and vice-versa. A. d Aspremont. M2 OJME: Optimisation convexe. 23/128

Norm balls and norm cones norm: a function that satisfies x 0; x = 0 if and only if x = 0 tx = t x for t R x + y x + y notation: is general (unspecified) norm; symb is particular norm norm ball with center x c and radius r: {x x x c r} 1 norm cone: {(x, t) x t} t 0.5 Euclidean norm cone is called secondorder cone 1 0 0 x 2 1 1 x 1 0 1 norm balls and cones are convex A. d Aspremont. M2 OJME: Optimisation convexe. 24/128

Polyhedra solution set of finitely many linear inequalities and equalities Ax b, Cx = d (A R m n, C R p n, is componentwise inequality) a 1 a2 a 5 P a 3 a 4 polyhedron is intersection of finite number of halfspaces and hyperplanes A. d Aspremont. M2 OJME: Optimisation convexe. 25/128

Positive semidefinite cone notation: S n is set of symmetric n n matrices S n + = {X S n X 0}: positive semidefinite n n matrices X S n + z T Xz 0 for all z S n + is a convex cone S n ++ = {X S n X 0}: positive definite n n matrices 1 example: [ x y y z ] S 2 + z 0.5 0 1 0 y 1 0 x 0.5 1 A. d Aspremont. M2 OJME: Optimisation convexe. 26/128

Operations that preserve convexity practical methods for establishing convexity of a set C 1. apply definition x 1, x 2 C, 0 θ 1 = θx 1 + (1 θ)x 2 C 2. show that C is obtained from simple convex sets (hyperplanes, halfspaces, norm balls,... ) by operations that preserve convexity intersection affine functions perspective function linear-fractional functions A. d Aspremont. M2 OJME: Optimisation convexe. 27/128

Intersection the intersection of (any number of) convex sets is convex example: S = {x R m p(t) 1 for t π/3} where p(t) = x 1 cos t + x 2 cos 2t + + x m cos mt for m = 2: 2 p(t) 1 0 1 x2 1 0 S 1 0 π/3 2π/3 π t 2 2 1 0 1 2 x 1 A. d Aspremont. M2 OJME: Optimisation convexe. 28/128

Affine function suppose f : R n R m is affine (f(x) = Ax + b with A R m n, b R m ) the image of a convex set under f is convex S R n convex = f(s) = {f(x) x S} convex the inverse image f 1 (C) of a convex set under f is convex C R m convex = f 1 (C) = {x R n f(x) C} convex examples scaling, translation, projection solution set of linear matrix inequality {x x 1 A 1 + + x m A m B} (with A i, B S p ) hyperbolic cone {x x T P x (c T x) 2, c T x 0} (with P S n +) A. d Aspremont. M2 OJME: Optimisation convexe. 29/128

Perspective and linear-fractional function perspective function P : R n+1 R n : P (x, t) = x/t, dom P = {(x, t) t > 0} images and inverse images of convex sets under perspective are convex linear-fractional function f : R n R m : f(x) = Ax + b c T x + d, dom f = {x ct x + d > 0} images and inverse images of convex sets under linear-fractional functions are convex A. d Aspremont. M2 OJME: Optimisation convexe. 30/128

Generalized inequalities a convex cone K R n is a proper cone if K is closed (contains its boundary) K is solid (has nonempty interior) K is pointed (contains no line) examples nonnegative orthant K = R n + = {x R n x i 0, i = 1,..., n} positive semidefinite cone K = S n + nonnegative polynomials on [0, 1]: K = {x R n x 1 + x 2 t + x 3 t 2 + + x n t n 1 0 for t [0, 1]} A. d Aspremont. M2 OJME: Optimisation convexe. 31/128

generalized inequality defined by a proper cone K: x K y y x K, x K y y x int K examples componentwise inequality (K = R n +) x R n + y x i y i, i = 1,..., n matrix inequality (K = S n +) X S n + Y Y X positive semidefinite these two types are so common that we drop the subscript in K properties: many properties of K are similar to on R, e.g., x K y, u K v = x + u K y + v A. d Aspremont. M2 OJME: Optimisation convexe. 32/128

Separating hyperplane theorem if C and D are disjoint convex sets, then there exists a 0, b such that a T x b for x C, a T x b for x D a T x b a T x b D C a the hyperplane {x a T x = b} separates C and D Classical result. Proof relies on minimizing distance between set, and using the argmin to explicitly produce separating hyperplane. A. d Aspremont. M2 OJME: Optimisation convexe. 33/128

Supporting hyperplane theorem supporting hyperplane to set C at boundary point x 0 : {x a T x = a T x 0 } where a 0 and a T x a T x 0 for all x C a C x 0 supporting hyperplane theorem: if C is convex, then there exists a supporting hyperplane at every boundary point of C A. d Aspremont. M2 OJME: Optimisation convexe. 34/128

Dual cones and generalized inequalities dual cone of a cone K: examples K = {y y T x 0 for all x K} K = R n +: K = R n + K = S n +: K = S n + K = {(x, t) x 2 t}: K = {(x, t) x 2 t} K = {(x, t) x 1 t}: K = {(x, t) x t} first three examples are self-dual cones dual cones of proper cones are proper, hence define generalized inequalities: y K 0 y T x 0 for all x K 0 A. d Aspremont. M2 OJME: Optimisation convexe. 35/128

Convex Functions A. d Aspremont. M2 OJME: Optimisation convexe. 36/128

Outline basic properties and examples operations that preserve convexity the conjugate function quasiconvex functions log-concave and log-convex functions convexity with respect to generalized inequalities A. d Aspremont. M2 OJME: Optimisation convexe. 37/128

Definition f : R n R is convex if dom f is a convex set and for all x, y dom f, 0 θ 1 f(θx + (1 θ)y) θf(x) + (1 θ)f(y) (x,f(x)) (y,f(y)) f is concave if f is convex f is strictly convex if dom f is convex and for x, y dom f, x y, 0 < θ < 1 f(θx + (1 θ)y) < θf(x) + (1 θ)f(y) A. d Aspremont. M2 OJME: Optimisation convexe. 38/128

Examples on R convex: affine: ax + b on R, for any a, b R exponential: e ax, for any a R powers: x α on R ++, for α 1 or α 0 powers of absolute value: x p on R, for p 1 negative entropy: x log x on R ++ concave: affine: ax + b on R, for any a, b R powers: x α on R ++, for 0 α 1 logarithm: log x on R ++ A. d Aspremont. M2 OJME: Optimisation convexe. 39/128

Examples on R n and R m n affine functions are convex and concave; all norms are convex examples on R n affine function f(x) = a T x + b norms: x p = ( n i=1 x i p ) 1/p for p 1; x = max k x k examples on R m n (m n matrices) affine function f(x) = Tr(A T X) + b = m i=1 n A ij X ij + b j=1 spectral (maximum singular value) norm f(x) = X 2 = σ max (X) = (λ max (X T X)) 1/2 A. d Aspremont. M2 OJME: Optimisation convexe. 40/128

Restriction of a convex function to a line f : R n R is convex if and only if the function g : R R, g(t) = f(x + tv), dom g = {t x + tv dom f} is convex (in t) for any x dom f, v R n can check convexity of f by checking convexity of functions of one variable example. f : S n R with f(x) = log det X, dom X = S n ++ g(t) = log det(x + tv ) = log det X + log det(i + tx 1/2 V X 1/2 ) = n log det X + log(1 + tλ i ) where λ i are the eigenvalues of X 1/2 V X 1/2 g is concave in t (for any choice of X 0, V ); hence f is concave i=1 A. d Aspremont. M2 OJME: Optimisation convexe. 41/128

Extended-value extension extended-value extension f of f is f(x) = f(x), x dom f, f(x) =, x dom f often simplifies notation; for example, the condition 0 θ 1 = f(θx + (1 θ)y) θ f(x) + (1 θ) f(y) (as an inequality in R { }), means the same as the two conditions dom f is convex for x, y dom f, 0 θ 1 = f(θx + (1 θ)y) θf(x) + (1 θ)f(y) A. d Aspremont. M2 OJME: Optimisation convexe. 42/128

First-order condition f is differentiable if dom f is open and the gradient exists at each x dom f f(x) = ( f(x) x 1, f(x),..., f(x) ) x 2 x n 1st-order condition: differentiable f with convex domain is convex iff f(y) f(x) + f(x) T (y x) for all x, y dom f f(y) f(x) + f(x) T (y x) (x,f(x)) first-order approximation of f is global underestimator A. d Aspremont. M2 OJME: Optimisation convexe. 43/128

Second-order conditions f is twice differentiable if dom f is open and the Hessian 2 f(x) S n, exists at each x dom f 2 f(x) ij = 2 f(x) x i x j, i, j = 1,..., n, 2nd-order conditions: for twice differentiable f with convex domain f is convex if and only if 2 f(x) 0 for all x dom f if 2 f(x) 0 for all x dom f, then f is strictly convex A. d Aspremont. M2 OJME: Optimisation convexe. 44/128

Examples quadratic function: f(x) = (1/2)x T P x + q T x + r (with P S n ) f(x) = P x + q, 2 f(x) = P convex if P 0 least-squares objective: f(x) = Ax b 2 2 f(x) = 2A T (Ax b), 2 f(x) = 2A T A convex (for any A) quadratic-over-linear: f(x, y) = x 2 /y 2 2 f(x, y) = 2 y 3 [ y x ] [ y x ] T 0 f(x,y) 1 0 2 2 1 0 convex for y > 0 y 0 2 x A. d Aspremont. M2 OJME: Optimisation convexe. 45/128

log-sum-exp: f(x) = log n k=1 exp x k is convex 2 f(x) = 1 1 T z diag(z) 1 (1 T z) 2zzT (z k = exp x k ) to show 2 f(x) 0, we must verify that v T 2 f(x)v 0 for all v: v T 2 f(x)v = ( k z kv 2 k )( k z k) ( k v kz k ) 2 ( k z k) 2 0 since ( k v kz k ) 2 ( k z kv 2 k )( k z k) (from Cauchy-Schwarz inequality) geometric mean: f(x) = ( n k=1 x k) 1/n on R n ++ is concave (similar proof as for log-sum-exp) A. d Aspremont. M2 OJME: Optimisation convexe. 46/128

Epigraph and sublevel set α-sublevel set of f : R n R: C α = {x dom f f(x) α} sublevel sets of convex functions are convex (converse is false) epigraph of f : R n R: epi f = {(x, t) R n+1 x dom f, f(x) t} epif f f is convex if and only if epi f is a convex set A. d Aspremont. M2 OJME: Optimisation convexe. 47/128

Jensen s inequality basic inequality: if f is convex, then for 0 θ 1, f(θx + (1 θ)y) θf(x) + (1 θ)f(y) extension: if f is convex, then for any random variable z f(e z) E f(z) basic inequality is special case with discrete distribution Prob(z = x) = θ, Prob(z = y) = 1 θ A. d Aspremont. M2 OJME: Optimisation convexe. 48/128

Operations that preserve convexity practical methods for establishing convexity of a function 1. verify definition (often simplified by restricting to a line) 2. for twice differentiable functions, show 2 f(x) 0 3. show that f is obtained from simple convex functions by operations that preserve convexity nonnegative weighted sum composition with affine function pointwise maximum and supremum composition minimization perspective A. d Aspremont. M2 OJME: Optimisation convexe. 49/128

Positive weighted sum & composition with affine function nonnegative multiple: αf is convex if f is convex, α 0 sum: f 1 + f 2 convex if f 1, f 2 convex (extends to infinite sums, integrals) composition with affine function: f(ax + b) is convex if f is convex examples log barrier for linear inequalities f(x) = m log(b i a T i x), dom f = {x a T i x < b i, i = 1,..., m} i=1 (any) norm of affine function: f(x) = Ax + b A. d Aspremont. M2 OJME: Optimisation convexe. 50/128

Pointwise maximum if f 1,..., f m are convex, then f(x) = max{f 1 (x),..., f m (x)} is convex examples piecewise-linear function: f(x) = max i=1,...,m (a T i x + b i) is convex sum of r largest components of x R n : f(x) = x [1] + x [2] + + x [r] is convex (x [i] is ith largest component of x) proof: f(x) = max{x i1 + x i2 + + x ir 1 i 1 < i 2 < < i r n} A. d Aspremont. M2 OJME: Optimisation convexe. 51/128

Pointwise supremum if f(x, y) is convex in x for each y A, then is convex examples g(x) = sup f(x, y) y A support function of a set C: S C (x) = sup y C y T x is convex distance to farthest point in a set C: f(x) = sup x y y C maximum eigenvalue of symmetric matrix: for X S n, λ max (X) = sup y T Xy y 2 =1 A. d Aspremont. M2 OJME: Optimisation convexe. 52/128

Composition with scalar functions composition of g : R n R and h : R R: f(x) = h(g(x)) f is convex if g convex, h convex, h nondecreasing g concave, h convex, h nonincreasing proof (for n = 1, differentiable g, h) f (x) = h (g(x))g (x) 2 + h (g(x))g (x) note: monotonicity must hold for extended-value extension h examples exp g(x) is convex if g is convex 1/g(x) is convex if g is concave and positive A. d Aspremont. M2 OJME: Optimisation convexe. 53/128

Vector composition composition of g : R n R k and h : R k R: f(x) = h(g(x)) = h(g 1 (x), g 2 (x),..., g k (x)) f is convex if g i convex, h convex, h nondecreasing in each argument g i concave, h convex, h nonincreasing in each argument proof (for n = 1, differentiable g, h) f (x) = g (x) T 2 h(g(x))g (x) + h(g(x)) T g (x) examples m i=1 log g i(x) is concave if g i are concave and positive log m i=1 exp g i(x) is convex if g i are convex A. d Aspremont. M2 OJME: Optimisation convexe. 54/128

Minimization if f(x, y) is convex in (x, y) and C is a convex set, then is convex examples g(x) = inf f(x, y) y C f(x, y) = x T Ax + 2x T By + y T Cy with [ A B B T C ] 0, C 0 minimizing over y gives g(x) = inf y f(x, y) = x T (A BC 1 B T )x g is convex, hence Schur complement A BC 1 B T 0 distance to a set: dist(x, S) = inf y S x y is convex if S is convex A. d Aspremont. M2 OJME: Optimisation convexe. 55/128

The conjugate function the conjugate of a function f is f (y) = sup (y T x f(x)) x dom f f(x) xy x (0, f (y)) f is convex (even if f is not) Used in regularization, duality results,... A. d Aspremont. M2 OJME: Optimisation convexe. 56/128

examples negative logarithm f(x) = log x f (y) = sup(xy + log x) = x>0 { 1 log( y) y < 0 otherwise strictly convex quadratic f(x) = (1/2)x T Qx with Q S n ++ f (y) = sup(y x (1/2)x T Qx) x = 1 2 yt Q 1 y A. d Aspremont. M2 OJME: Optimisation convexe. 57/128

Quasiconvex functions f : R n R is quasiconvex if dom f is convex and the sublevel sets are convex for all α S α = {x dom f f(x) α} β α a b c f is quasiconcave if f is quasiconvex f is quasilinear if it is quasiconvex and quasiconcave A. d Aspremont. M2 OJME: Optimisation convexe. 58/128

Examples x is quasiconvex on R ceil(x) = inf{z Z z x} is quasilinear log x is quasilinear on R ++ f(x 1, x 2 ) = x 1 x 2 is quasiconcave on R 2 ++ linear-fractional function f(x) = at x + b c T x + d, dom f = {x ct x + d > 0} is quasilinear distance ratio is quasiconvex f(x) = x a 2 x b 2, dom f = {x x a 2 x b 2 } A. d Aspremont. M2 OJME: Optimisation convexe. 59/128

Properties modified Jensen inequality: for quasiconvex f 0 θ 1 = f(θx + (1 θ)y) max{f(x), f(y)} first-order condition: differentiable f with cvx domain is quasiconvex iff f(y) f(x) = f(x) T (y x) 0 x f(x) sums of quasiconvex functions are not necessarily quasiconvex A. d Aspremont. M2 OJME: Optimisation convexe. 60/128

Log-concave and log-convex functions a positive function f is log-concave if log f is concave: f(θx + (1 θ)y) f(x) θ f(y) 1 θ for 0 θ 1 f is log-convex if log f is convex powers: x a on R ++ is log-convex for a 0, log-concave for a 0 many common probability densities are log-concave, e.g., normal: f(x) = 1 (2π)n det Σ e 1 2 (x x)t Σ 1 (x x) cumulative Gaussian distribution function Φ is log-concave Φ(x) = 1 2π x e u2 /2 du A. d Aspremont. M2 OJME: Optimisation convexe. 61/128

Properties of log-concave functions twice differentiable f with convex domain is log-concave if and only if f(x) 2 f(x) f(x) f(x) T for all x dom f product of log-concave functions is log-concave sum of log-concave functions is not always log-concave integration: if f : R n R m R is log-concave, then g(x) = f(x, y) dy is log-concave (not easy to show) A. d Aspremont. M2 OJME: Optimisation convexe. 62/128

consequences of integration property convolution f g of log-concave functions f, g is log-concave (f g)(x) = f(x y)g(y)dy if C R n convex and y is a random variable with log-concave pdf then f(x) = Prob(x + y C) is log-concave proof: write f(x) as integral of product of log-concave functions f(x) = g(x + y)p(y) dy, g(u) = { 1 u C 0 u C, p is pdf of y A. d Aspremont. M2 OJME: Optimisation convexe. 63/128

Convex Optimization Problems A. d Aspremont. M2 OJME: Optimisation convexe. 64/128

Outline optimization problem in standard form convex optimization problems quasiconvex optimization linear optimization quadratic optimization geometric programming generalized inequality constraints semidefinite programming vector optimization A. d Aspremont. M2 OJME: Optimisation convexe. 65/128

Optimization problem in standard form minimize f 0 (x) subject to f i (x) 0, i = 1,..., m h i (x) = 0, i = 1,..., p x R n is the optimization variable f 0 : R n R is the objective or cost function f i : R n R, i = 1,..., m, are the inequality constraint functions h i : R n R are the equality constraint functions optimal value: p = inf{f 0 (x) f i (x) 0, i = 1,..., m, h i (x) = 0, i = 1,..., p} p = if problem is infeasible (no x satisfies the constraints) p = if problem is unbounded below A. d Aspremont. M2 OJME: Optimisation convexe. 66/128

Optimal and locally optimal points x is feasible if x dom f 0 and it satisfies the constraints a feasible x is optimal if f 0 (x) = p ; X opt is the set of optimal points x is locally optimal if there is an R > 0 such that x is optimal for minimize (over z) f 0 (z) subject to f i (z) 0, i = 1,..., m, h i (z) = 0, i = 1,..., p z x 2 R examples (with n = 1, m = p = 0) f 0 (x) = 1/x, dom f 0 = R ++ : p = 0, no optimal point f 0 (x) = log x, dom f 0 = R ++ : p = f 0 (x) = x log x, dom f 0 = R ++ : p = 1/e, x = 1/e is optimal f 0 (x) = x 3 3x, p =, local optimum at x = 1 A. d Aspremont. M2 OJME: Optimisation convexe. 67/128

Implicit constraints the standard form optimization problem has an implicit constraint x D = m i=0 dom f i p i=1 dom h i, we call D the domain of the problem the constraints f i (x) 0, h i (x) = 0 are the explicit constraints a problem is unconstrained if it has no explicit constraints (m = p = 0) example: minimize f 0 (x) = k i=1 log(b i a T i x) is an unconstrained problem with implicit constraints a T i x < b i A. d Aspremont. M2 OJME: Optimisation convexe. 68/128

Feasibility problem find x subject to f i (x) 0, i = 1,..., m h i (x) = 0, i = 1,..., p can be considered a special case of the general problem with f 0 (x) = 0: minimize 0 subject to f i (x) 0, i = 1,..., m h i (x) = 0, i = 1,..., p p = 0 if constraints are feasible; any feasible x is optimal p = if constraints are infeasible A. d Aspremont. M2 OJME: Optimisation convexe. 69/128

Convex optimization problem standard form convex optimization problem minimize f 0 (x) subject to f i (x) 0, i = 1,..., m a T i x = b i, i = 1,..., p f 0, f 1,..., f m are convex; equality constraints are affine problem is quasiconvex if f 0 is quasiconvex (and f 1,..., f m convex) often written as minimize f 0 (x) subject to f i (x) 0, i = 1,..., m Ax = b important property: feasible set of a convex optimization problem is convex A. d Aspremont. M2 OJME: Optimisation convexe. 70/128

example minimize f 0 (x) = x 2 1 + x 2 2 subject to f 1 (x) = x 1 /(1 + x 2 2) 0 h 1 (x) = (x 1 + x 2 ) 2 = 0 f 0 is convex; feasible set {(x 1, x 2 ) x 1 = x 2 0} is convex not a convex problem (according to our definition): f 1 is not convex, h 1 is not affine equivalent (but not identical) to the convex problem minimize x 2 1 + x 2 2 subject to x 1 0 x 1 + x 2 = 0 A. d Aspremont. M2 OJME: Optimisation convexe. 71/128

Local and global optima any locally optimal point of a convex problem is (globally) optimal Proof: suppose x is locally optimal and y is optimal with f 0 (y) < f 0 (x) x locally optimal means there is an R > 0 such that z feasible, z x 2 R = f 0 (z) f 0 (x) consider z = θy + (1 θ)x with θ = R/(2 y x 2 ) y x 2 > R, so 0 < θ < 1/2 z is a convex combination of two feasible points, hence also feasible z x 2 = R/2 and f 0 (z) θf 0 (x) + (1 θ)f 0 (y) < f 0 (x) which contradicts our assumption that x is locally optimal A. d Aspremont. M2 OJME: Optimisation convexe. 72/128

Optimality criterion for differentiable f 0 x is optimal if and only if it is feasible and f 0 (x) T (y x) 0 for all feasible y X x f 0 (x) if nonzero, f 0 (x) defines a supporting hyperplane to feasible set X at x A. d Aspremont. M2 OJME: Optimisation convexe. 73/128

unconstrained problem: x is optimal if and only if x dom f 0, f 0 (x) = 0 equality constrained problem minimize f 0 (x) subject to Ax = b x is optimal if and only if there exists a ν such that x dom f 0, Ax = b, f 0 (x) + A T ν = 0 minimization over nonnegative orthant x is optimal if and only if minimize f 0 (x) subject to x 0 x dom f 0, x 0, { f0 (x) i 0 x i = 0 f 0 (x) i = 0 x i > 0 A. d Aspremont. M2 OJME: Optimisation convexe. 74/128

Equivalent convex problems two problems are (informally) equivalent if the solution of one is readily obtained from the solution of the other, and vice-versa some common transformations that preserve convexity: eliminating equality constraints minimize f 0 (x) subject to f i (x) 0, i = 1,..., m Ax = b is equivalent to minimize (over z) f 0 (F z + x 0 ) subject to f i (F z + x 0 ) 0, i = 1,..., m where F and x 0 are such that Ax = b x = F z + x 0 for some z A. d Aspremont. M2 OJME: Optimisation convexe. 75/128

introducing equality constraints minimize f 0 (A 0 x + b 0 ) subject to f i (A i x + b i ) 0, i = 1,..., m is equivalent to minimize (over x, y i ) f 0 (y 0 ) subject to f i (y i ) 0, i = 1,..., m y i = A i x + b i, i = 0, 1,..., m introducing slack variables for linear inequalities minimize f 0 (x) subject to a T i x b i, i = 1,..., m is equivalent to minimize (over x, s) f 0 (x) subject to a T i x + s i = b i, i = 1,..., m s i 0, i = 1,... m A. d Aspremont. M2 OJME: Optimisation convexe. 76/128

epigraph form: standard form convex problem is equivalent to minimize (over x, t) t subject to f 0 (x) t 0 f i (x) 0, i = 1,..., m Ax = b minimizing over some variables minimize f 0 (x 1, x 2 ) subject to f i (x 1 ) 0, i = 1,..., m is equivalent to minimize f0 (x 1 ) subject to f i (x 1 ) 0, i = 1,..., m where f 0 (x 1 ) = inf x2 f 0 (x 1, x 2 ) A. d Aspremont. M2 OJME: Optimisation convexe. 77/128

Quasiconvex optimization minimize f 0 (x) subject to f i (x) 0, i = 1,..., m Ax = b with f 0 : R n R quasiconvex, f 1,..., f m convex can have locally optimal points that are not (globally) optimal (x,f 0 (x)) A. d Aspremont. M2 OJME: Optimisation convexe. 78/128

quasiconvex optimization via convex feasibility problems f 0 (x) t, f i (x) 0, i = 1,..., m, Ax = b (1) for fixed t, a convex feasibility problem in x if feasible, we can conclude that t p ; if infeasible, t p Bisection method for quasiconvex optimization given l p, u p, tolerance ɛ > 0. repeat 1. t := (l + u)/2. 2. Solve the convex feasibility problem (1). 3. if (1) is feasible, u := t; else l := t. until u l ɛ. requires exactly log 2 ((u l)/ɛ) iterations (where u, l are initial values) A. d Aspremont. M2 OJME: Optimisation convexe. 79/128

Linear program (LP) minimize subject to c T x + d Gx h Ax = b convex problem with affine objective and constraint functions feasible set is a polyhedron c P x A. d Aspremont. M2 OJME: Optimisation convexe. 80/128

Chebyshev center of a polyhedron Chebyshev center of P = {x a T i x b i, i = 1,..., m} is center of largest inscribed ball x cheb B = {x c + u u 2 r} a T i x b i for all x B if and only if sup{a T i (x c + u) u 2 r} = a T i x c + r a i 2 b i hence, x c, r can be determined by solving the LP maximize r subject to a T i x c + r a i 2 b i, i = 1,..., m A. d Aspremont. M2 OJME: Optimisation convexe. 81/128

(Generalized) linear-fractional program minimize subject to f 0 (x) Gx h Ax = b linear-fractional program f 0 (x) = ct x + d e T x + f, dom f 0(x) = {x e T x + f > 0} a quasiconvex optimization problem; can be solved by bisection also equivalent to the LP (variables y, z) minimize subject to c T y + dz Gy hz Ay = bz e T y + fz = 1 z 0 A. d Aspremont. M2 OJME: Optimisation convexe. 82/128

Quadratic program (QP) minimize subject to (1/2)x T P x + q T x + r Gx h Ax = b P S n +, so objective is convex quadratic minimize a convex quadratic function over a polyhedron f 0 (x ) x P A. d Aspremont. M2 OJME: Optimisation convexe. 83/128

Examples least-squares minimize Ax b 2 2 analytical solution x = A b (A is pseudo-inverse) can add linear constraints, e.g., l x u linear program with random cost minimize c T x + γx T Σx = E c T x + γ var(c T x) subject to Gx h, Ax = b c is random vector with mean c and covariance Σ hence, c T x is random variable with mean c T x and variance x T Σx γ > 0 is risk aversion parameter; controls the trade-off between expected cost and variance (risk) A. d Aspremont. M2 OJME: Optimisation convexe. 84/128

Quadratically constrained quadratic program (QCQP) minimize (1/2)x T P 0 x + q0 T x + r 0 subject to (1/2)x T P i x + qi T x + r i 0, i = 1,..., m Ax = b P i S n +; objective and constraints are convex quadratic if P 1,..., P m S n ++, feasible region is intersection of m ellipsoids and an affine set A. d Aspremont. M2 OJME: Optimisation convexe. 85/128

Second-order cone programming (A i R n i n, F R p n ) minimize f T x subject to A i x + b i 2 c T i x + d i, i = 1,..., m F x = g inequalities are called second-order cone (SOC) constraints: (A i x + b i, c T i x + d i ) second-order cone in R n i+1 for n i = 0, reduces to an LP; if c i = 0, reduces to a QCQP more general than QCQP and LP A. d Aspremont. M2 OJME: Optimisation convexe. 86/128

Robust linear programming the parameters in optimization problems are often uncertain, e.g., in an LP minimize c T x subject to a T i x b i, i = 1,..., m, there can be uncertainty in c, a i, b i two common approaches to handling uncertainty (in a i, for simplicity) deterministic model: constraints must hold for all a i E i minimize c T x subject to a T i x b i for all a i E i, i = 1,..., m, stochastic model: a i is random variable; constraints must hold with probability η minimize c T x subject to Prob(a T i x b i) η, i = 1,..., m A. d Aspremont. M2 OJME: Optimisation convexe. 87/128

deterministic approach via SOCP choose an ellipsoid as E i : E i = {ā i + P i u u 2 1} (ā i R n, P i R n n ) center is ā i, semi-axes determined by singular values/vectors of P i robust LP minimize c T x subject to a T i x b i a i E i, i = 1,..., m is equivalent to the SOCP minimize c T x subject to ā T i x + P i T x 2 b i, i = 1,..., m (follows from sup u 2 1(ā i + P i u) T x = ā T i x + P i T x 2) A. d Aspremont. M2 OJME: Optimisation convexe. 88/128

stochastic approach via SOCP assume a i is Gaussian with mean ā i, covariance Σ i (a i N (ā i, Σ i )) a T i x is Gaussian r.v. with mean āt i x, variance xt Σ i x; hence ( ) Prob(a T b i ā T i i x b i ) = Φ x Σ 1/2 i x 2 where Φ(x) = (1/ 2π) x e t2 /2 dt is CDF of N (0, 1) robust LP minimize c T x subject to Prob(a T i x b i) η, i = 1,..., m, with η 1/2, is equivalent to the SOCP minimize c T x subject to ā T i x + Φ 1 (η) Σ 1/2 i x 2 b i, i = 1,..., m A. d Aspremont. M2 OJME: Optimisation convexe. 89/128

Impact of reliability {x Prob(a T i x b i ) η, i = 1,..., m} η = 10% η = 50% η = 90% A. d Aspremont. M2 OJME: Optimisation convexe. 90/128

Generalized inequality constraints convex problem with generalized inequality constraints minimize f 0 (x) subject to f i (x) Ki 0, i = 1,..., m Ax = b f 0 : R n R convex; f i : R n R k i K i -convex w.r.t. proper cone K i same properties as standard convex problem (convex feasible set, local optimum is global, etc.) conic form problem: special case with affine objective and constraints minimize c T x subject to F x + g K 0 Ax = b extends linear programming (K = R m +) to nonpolyhedral cones A. d Aspremont. M2 OJME: Optimisation convexe. 91/128

Semidefinite program (SDP) with F i, G S k minimize c T x subject to x 1 F 1 + x 2 F 2 + + x n F n + G 0 Ax = b inequality constraint is called linear matrix inequality (LMI) includes problems with multiple LMI constraints: for example, x 1 ˆF1 + + x n ˆFn + Ĝ 0, x 1 F 1 + + x n Fn + G 0 is equivalent to single LMI x 1 [ ˆF1 0 0 F1 ] + x 2 [ ˆF2 0 0 F2 ] + + x n [ ˆFn 0 0 Fn ] + [ Ĝ 0 0 G ] 0 A. d Aspremont. M2 OJME: Optimisation convexe. 92/128

LP and SOCP as SDP LP and equivalent SDP LP: minimize c T x subject to Ax b SDP: minimize c T x subject to diag(ax b) 0 (note different interpretation of generalized inequality ) SOCP and equivalent SDP SOCP: minimize f T x subject to A i x + b i 2 c T i x + d i, i = 1,..., m SDP: minimize f [ T x (c T subject to i x + d i )I A i x + b i (A i x + b i ) T c T i x + d i ] 0, i = 1,..., m A. d Aspremont. M2 OJME: Optimisation convexe. 93/128

Eigenvalue minimization minimize λ max (A(x)) where A(x) = A 0 + x 1 A 1 + + x n A n (with given A i S k ) equivalent SDP minimize subject to t A(x) ti variables x R n, t R follows from λ max (A) t A ti A. d Aspremont. M2 OJME: Optimisation convexe. 94/128

Matrix norm minimization minimize A(x) 2 = ( λ max (A(x) T A(x)) ) 1/2 where A(x) = A 0 + x 1 A 1 + + x n A n (with given A i S p q ) equivalent SDP minimize subject to t[ ti A(x) T A(x) ti ] 0 variables x R n, t R constraint follows from A 2 t A T A t 2 I, t 0 [ ] ti A A T 0 ti A. d Aspremont. M2 OJME: Optimisation convexe. 95/128

Duality A. d Aspremont. M2 OJME: Optimisation convexe. 96/128

Outline Lagrange dual problem weak and strong duality optimality conditions perturbation and sensitivity analysis examples generalized inequalities A. d Aspremont. M2 OJME: Optimisation convexe. 97/128

Lagrangian standard form problem (not necessarily convex) minimize f 0 (x) subject to f i (x) 0, i = 1,..., m h i (x) = 0, i = 1,..., p variable x R n, domain D, optimal value p Lagrangian: L : R n R m R p R, with dom L = D R m R p, L(x, λ, ν) = f 0 (x) + m λ i f i (x) + i=1 p ν i h i (x) i=1 weighted sum of objective and constraint functions λ i is Lagrange multiplier associated with f i (x) 0 ν i is Lagrange multiplier associated with h i (x) = 0 A. d Aspremont. M2 OJME: Optimisation convexe. 98/128

Lagrange dual function Lagrange dual function: g : R m R p R, g(λ, ν) = inf x D = inf x D L(x, λ, ν) ( f 0 (x) + g is concave, can be for some λ, ν m λ i f i (x) + i=1 ) p ν i h i (x) i=1 lower bound property: if λ 0, then g(λ, ν) p proof: if x is feasible and λ 0, then f 0 ( x) L( x, λ, ν) inf L(x, λ, ν) = g(λ, ν) x D minimizing over all feasible x gives p g(λ, ν) A. d Aspremont. M2 OJME: Optimisation convexe. 99/128

Least-norm solution of linear equations minimize x T x subject to Ax = b dual function Lagrangian is L(x, ν) = x T x + ν T (Ax b) to minimize L over x, set gradient equal to zero: x L(x, ν) = 2x + A T ν = 0 = x = (1/2)A T ν plug in in L to obtain g: a concave function of ν g(ν) = L(( 1/2)A T ν, ν) = 1 4 νt AA T ν b T ν lower bound property: p (1/4)ν T AA T ν b T ν for all ν A. d Aspremont. M2 OJME: Optimisation convexe. 100/128

Standard form LP minimize c T x subject to Ax = b, x 0 dual function Lagrangian is L(x, λ, ν) = c T x + ν T (Ax b) λ T x = b T ν + (c + A T ν λ) T x L is linear in x, hence g(λ, ν) = inf x L(x, λ, ν) = { b T ν A T ν λ + c = 0 otherwise g is linear on affine domain {(λ, ν) A T ν λ + c = 0}, hence concave lower bound property: p b T ν if A T ν + c 0 A. d Aspremont. M2 OJME: Optimisation convexe. 101/128

Equality constrained norm minimization dual function minimize subject to g(ν) = inf x ( x νt Ax + b T ν) = x Ax = b where v = sup u 1 u T v is dual norm of { b T ν A T ν 1 otherwise proof: follows from inf x ( x y T x) = 0 if y 1, otherwise if y 1, then x y T x 0 for all x, with equality if x = 0 if y > 1, choose x = tu where u 1, u T y = y > 1: x y T x = t( u y ) as t lower bound property: p b T ν if A T ν 1 A. d Aspremont. M2 OJME: Optimisation convexe. 102/128

Two-way partitioning minimize x T W x subject to x 2 i = 1, i = 1,..., n a nonconvex problem; feasible set contains 2 n discrete points interpretation: partition {1,..., n} in two sets; W ij is cost of assigning i, j to the same set; W ij is cost of assigning to different sets dual function g(ν) = inf x (xt W x + i ν i (x 2 i 1)) = inf x xt (W + diag(ν))x 1 T ν { 1 = T ν W + diag(ν) 0 otherwise lower bound property: p 1 T ν if W + diag(ν) 0 example: ν = λ min (W )1 gives bound p nλ min (W ) A. d Aspremont. M2 OJME: Optimisation convexe. 103/128

The dual problem Lagrange dual problem maximize g(λ, ν) subject to λ 0 finds best lower bound on p, obtained from Lagrange dual function a convex optimization problem; optimal value denoted d λ, ν are dual feasible if λ 0, (λ, ν) dom g often simplified by making implicit constraint (λ, ν) dom g explicit example: standard form LP and its dual (page 101) minimize subject to c T x Ax = b x 0 maximize b T ν subject to A T ν + c 0 A. d Aspremont. M2 OJME: Optimisation convexe. 104/128

Weak and strong duality weak duality: d p always holds (for convex and nonconvex problems) can be used to find nontrivial lower bounds for difficult problems for example, solving the SDP maximize 1 T ν subject to W + diag(ν) 0 gives a lower bound for the two-way partitioning problem on page 103 strong duality: d = p does not hold in general (usually) holds for convex problems conditions that guarantee strong duality in convex problems are called constraint qualifications A. d Aspremont. M2 OJME: Optimisation convexe. 105/128

Slater s constraint qualification strong duality holds for a convex problem if it is strictly feasible, i.e., minimize f 0 (x) subject to f i (x) 0, i = 1,..., m Ax = b x int D : f i (x) < 0, i = 1,..., m, Ax = b also guarantees that the dual optimum is attained (if p > ) can be sharpened: e.g., can replace int D with relint D (interior relative to affine hull); linear inequalities do not need to hold with strict inequality,... there exist many other types of constraint qualifications A. d Aspremont. M2 OJME: Optimisation convexe. 106/128

Feasibility problems feasibility problem A in x R n. f i (x) < 0, i = 1,..., m, h i (x) = 0, i = 1,..., p feasibility problem B in λ R m, ν R p. λ 0, λ 0, g(λ, ν) 0 where g(λ, ν) = inf x ( m i=1 λ if i (x) + p i=1 ν ih i (x)) feasibility problem B is convex (g is concave), even if problem A is not A and B are always weak alternatives: at most one is feasible proof: assume x satisfies A, λ, ν satisfy B 0 g(λ, ν) m i=1 λ if i ( x) + p i=1 ν ih i ( x) < 0 A and B are strong alternatives if exactly one of the two is feasible (can prove infeasibility of A by producing solution of B and vice-versa). A. d Aspremont. M2 OJME: Optimisation convexe. 107/128

Inequality form LP primal problem minimize subject to c T x Ax b dual function g(λ) = inf x ( (c + A T λ) T x b T λ ) { b = T λ A T λ + c = 0 otherwise dual problem maximize b T λ subject to A T λ + c = 0, λ 0 from Slater s condition: p = d if A x b for some x in fact, p = d except when primal and dual are infeasible A. d Aspremont. M2 OJME: Optimisation convexe. 108/128

Quadratic program primal problem (assume P S n ++) minimize subject to x T P x Ax b dual function g(λ) = inf x ( x T P x + λ T (Ax b) ) = 1 4 λt AP 1 A T λ b T λ dual problem maximize (1/4)λ T AP 1 A T λ b T λ subject to λ 0 from Slater s condition: p = d if A x b for some x in fact, p = d always A. d Aspremont. M2 OJME: Optimisation convexe. 109/128

A nonconvex problem with strong duality minimize x T Ax + 2b T x subject to x T x 1 nonconvex if A 0 dual function: g(λ) = inf x (x T (A + λi)x + 2b T x λ) unbounded below if A + λi 0 or if A + λi 0 and b R(A + λi) minimized by x = (A + λi) b otherwise: g(λ) = b T (A + λi) b λ dual problem and equivalent SDP: maximize b T (A + λi) b λ subject to A + λi 0 b R(A + λi) maximize subject to t [ λ A + λi b b T t ] 0 strong duality although primal problem is not convex (more later) A. d Aspremont. M2 OJME: Optimisation convexe. 110/128

Complementary slackness Assume strong duality holds, x is primal optimal, (λ, ν ) is dual optimal f 0 (x ) = g(λ, ν ) = inf x ( f 0 (x ) + f 0 (x ) f 0 (x) + m λ i f i (x) + i=1 m λ i f i (x ) + i=1 ) p νi h i (x) i=1 p νi h i (x ) i=1 hence, the two inequalities hold with equality x minimizes L(x, λ, ν ) λ i f i(x ) = 0 for i = 1,..., m (known as complementary slackness): λ i > 0 = f i (x ) = 0, f i (x ) < 0 = λ i = 0 A. d Aspremont. M2 OJME: Optimisation convexe. 111/128

Karush-Kuhn-Tucker (KKT) conditions the following four conditions are called KKT conditions (for a problem with differentiable f i, h i ): 1. Primal feasibility: f i (x) 0, i = 1,..., m, h i (x) = 0, i = 1,..., p 2. Dual feasibility: λ 0 3. Complementary slackness: λ i f i (x) = 0, i = 1,..., m 4. Gradient of Lagrangian with respect to x vanishes (first order condition): f 0 (x) + m λ i f i (x) + i=1 p ν i h i (x) = 0 i=1 If strong duality holds and x, λ, ν are optimal, then they must satisfy the KKT conditions A. d Aspremont. M2 OJME: Optimisation convexe. 112/128

KKT conditions for convex problem If x, λ, ν satisfy KKT for a convex problem, then they are optimal: from complementary slackness: f 0 ( x) = L( x, λ, ν) from 4th condition (and convexity): g( λ, ν) = L( x, λ, ν) hence, f 0 ( x) = g( λ, ν) If Slater s condition is satisfied, x is optimal if and only if there exist λ, ν that satisfy KKT conditions recall that Slater implies strong duality, and dual optimum is attained generalizes optimality condition f 0 (x) = 0 for unconstrained problem Summary: When strong duality holds, the KKT conditions are necessary conditions for optimality If the problem is convex, they are also sufficient A. d Aspremont. M2 OJME: Optimisation convexe. 113/128

example: water-filling (assume α i > 0) minimize n i=1 log(x i + α i ) subject to x 0, 1 T x = 1 x is optimal iff x 0, 1 T x = 1, and there exist λ R n, ν R such that λ 0, λ i x i = 0, 1 x i + α i + λ i = ν if ν < 1/α i : λ i = 0 and x i = 1/ν α i if ν 1/α i : λ i = ν 1/α i and x i = 0 determine ν from 1 T x = n i=1 max{0, 1/ν α i} = 1 interpretation n patches; level of patch i is at height α i flood area with unit amount of water 1/ν x i α i resulting level is 1/ν i A. d Aspremont. M2 OJME: Optimisation convexe. 114/128

Duality and problem reformulations equivalent formulations of a problem can lead to very different duals reformulating the primal problem can be useful when the dual is difficult to derive, or uninteresting common reformulations introduce new variables and equality constraints make explicit constraints implicit or vice-versa transform objective or constraint functions e.g., replace f 0 (x) by φ(f 0 (x)) with φ convex, increasing A. d Aspremont. M2 OJME: Optimisation convexe. 115/128

Introducing new variables and equality constraints minimize f 0 (Ax + b) dual function is constant: g = inf x L(x) = inf x f 0 (Ax + b) = p we have strong duality, but dual is quite useless reformulated problem and its dual minimize f 0 (y) subject to Ax + b y = 0 maximize b T ν f0 (ν) subject to A T ν = 0 dual function follows from g(ν) = inf (f 0(y) ν T y + ν T Ax + b T ν) x,y { f = 0 (ν) + b T ν A T ν = 0 otherwise A. d Aspremont. M2 OJME: Optimisation convexe. 116/128

norm approximation problem: minimize Ax b minimize subject to y y = Ax b can look up conjugate of, or derive dual directly g(ν) = inf x,y νt y ν T Ax + b T ν) { b = ν + inf y ( y + ν T y) A T ν = 0 otherwise { b = ν A T ν = 0, ν 1 otherwise (see page 100) dual of norm approximation problem maximize b T ν subject to A T ν = 0, ν 1 A. d Aspremont. M2 OJME: Optimisation convexe. 117/128

Implicit constraints LP with box constraints: primal and dual problem minimize subject to c T x Ax = b 1 x 1 maximize b T ν 1 T λ 1 1 T λ 2 subject to c + A T ν + λ 1 λ 2 = 0 λ 1 0, λ 2 0 reformulation with box constraints made implicit minimize f 0 (x) = subject to Ax = b { c T x 1 x 1 otherwise dual function g(ν) = inf 1 x 1 (ct x + ν T (Ax b)) = b T ν A T ν + c 1 dual problem: maximize b T ν A T ν + c 1 A. d Aspremont. M2 OJME: Optimisation convexe. 118/128

Problems with generalized inequalities Ki is generalized inequality on R k i definitions are parallel to scalar case: minimize f 0 (x) subject to f i (x) Ki 0, i = 1,..., m h i (x) = 0, i = 1,..., p Lagrange multiplier for f i (x) Ki 0 is vector λ i R k i Lagrangian L : R n R k 1 R k m R p R, is defined as L(x, λ 1,, λ m, ν) = f 0 (x) + m λ T i f i (x) + i=1 p ν i h i (x) i=1 dual function g : R k 1 R k m R p R, is defined as g(λ 1,..., λ m, ν) = inf x D L(x, λ 1,, λ m, ν) A. d Aspremont. M2 OJME: Optimisation convexe. 119/128

lower bound property: if λ i K i 0, then g(λ 1,..., λ m, ν) p proof: if x is feasible and λ K i 0, then f 0 ( x) f 0 ( x) + m λ T i f i ( x) + i=1 inf L(x, λ 1,..., λ m, ν) x D = g(λ 1,..., λ m, ν) p ν i h i ( x) i=1 minimizing over all feasible x gives p g(λ 1,..., λ m, ν) dual problem maximize g(λ 1,..., λ m, ν) subject to λ i K i 0, i = 1,..., m weak duality: p d always strong duality: p = d for convex problem with constraint qualification (for example, Slater s: primal problem is strictly feasible) A. d Aspremont. M2 OJME: Optimisation convexe. 120/128

Semidefinite program primal SDP (F i, G S k ) minimize subject to c T x x 1 F 1 + + x n F n G Lagrange multiplier is matrix Z S k Lagrangian L(x, Z) = c T x + Tr (Z(x 1 F 1 + + x n F n G)) dual function g(z) = inf x L(x, Z) = { Tr(GZ) Tr(Fi Z) + c i = 0, i = 1,..., n otherwise dual SDP maximize Tr(GZ) subject to Z 0, Tr(F i Z) + c i = 0, i = 1,..., n p = d if primal SDP is strictly feasible ( x with x 1 F 1 + + x n F n G) A. d Aspremont. M2 OJME: Optimisation convexe. 121/128

Duality: SOCP example Let s consider the following Second Order Cone Program (SOCP): minimize f T x subject to A i x + b i 2 c T i x + d i, i = 1,..., m, with variable x R n. Let s show that the dual can be expressed as maximize subject to m i=1 (bt i u i + d i v i ) m i=1 (AT i u i + c i v i ) + f = 0 u i 2 v i, i = 1,..., m, with variables u i R n i, v i R, i = 1,..., m and problem data given by f R n, A i R n i n, b i R n i, c i R and d i R. A. d Aspremont. M2 OJME: Optimisation convexe. 122/128

Duality: SOCP We can derive the dual in the following two ways: 1. Introduce new variables y i R n i and t i R and equalities y i = A i x + b i, t i = c T i x + d i, and derive the Lagrange dual. 2. Start from the conic formulation of the SOCP and use the conic dual. Use the fact that the second-order cone is self-dual: t x tv + x T y 0, for all v, y such that v y The condition x T y tv is a simple Cauchy-Schwarz inequality A. d Aspremont. M2 OJME: Optimisation convexe. 123/128

Duality: SOCP We introduce new variables, and write the problem as minimize c T x subject to y i 2 t i, i = 1,..., m y i = A i x + b i, t i = c T i x + d i, i = 1,..., m The Lagrangian is L(x, y, t, λ, ν, µ) m m m = c T x + λ i ( y i 2 t i ) + νi T (y i A i x b i ) + µ i (t i c T i x d i ) i=1 m = (c A T i ν i i=1 n (b T i ν i + d i µ i ). i=1 i=1 m µ i c i ) T x + i=1 m (λ i y i 2 + νi T y i ) + i=1 i=1 i=1 m ( λ i + µ i )t i A. d Aspremont. M2 OJME: Optimisation convexe. 124/128