A Brief Review on Convex Optimization

A Brief Review on Convex Optimization 1

Convex set S R n is convex if x,y S, λ,µ 0, λ+µ = 1 λx+µy S geometrically: x,y S line segment through x,y S examples (one convex, two nonconvex sets): A Brief Review on Convex Optimization 2

Hyperplanes and halfspaces hyperplane: set of the form {x a T x = b, a 0} halfspace: set of the form {x a T x b, a 0} a x 0 a T x b a T x b a is the normal vector hyperplanes and halfspaces are convex A Brief Review on Convex Optimization 3

Euclidean balls and ellipsoids Euclidean ball with center x c and radius r: B(x c,r) = {x x x c 2 r} = {x c +ru u 2 1} ellipsoid: set of the form {x (x x c ) T P 1 (x x c ) 1} with P S n ++ (i.e., P symmetric positive definite) x c other representation: {x c +Au u 2 1} with A square and nonsingular A Brief Review on Convex Optimization 4

Polyhedra solution set of finitely many linear inequalities and equalities Ax b, Cx = d (A R m n, C R p n, is componentwise inequality) a 1 a2 a 5 P a 3 a 4 polyhedron is intersection of finite number of halfspaces and hyperplanes A Brief Review on Convex Optimization 5

Convex functions f : R n R is convex if domf is a convex set and f(θx+(1 θ)y) θf(x)+(1 θ)f(y) for all x,y domf, 0 θ 1 (x,f(x)) (y,f(y)) f is concave if f is convex f is strictly convex if domf is convex and f(θx+(1 θ)y) < θf(x)+(1 θ)f(y) for x,y domf, x y, 0 < θ < 1 A Brief Review on Convex Optimization 6

Examples on R convex: affine: ax+b on R, for any a,b R exponential: e ax, for any a R powers: x α on R ++, for α 1 or α 0 powers of absolute value: x p on R, for p 1 negative entropy: xlogx on R ++ concave: affine: ax+b on R, for any a,b R powers: x α on R ++, for 0 α 1 logarithm: logx on R ++ A Brief Review on Convex Optimization 7

Examples on R n and R m n affine functions are convex and concave; all norms are convex examples on R n affine function f(x) = a T x+b examples on R m n (m n matrices) affine function f(x) = tr(a T X)+b = m i=1 n A ij X ij +b j=1 A Brief Review on Convex Optimization 8

Restriction of a convex function to a line f : R n R is convex if and only if the function g : R R, g(t) = f(x+tv), domg = {t x+tv domf} is convex (in t) for any x domf, v R n. can check convexity of f by checking convexity of functions of one variable A Brief Review on Convex Optimization 9

First-order condition f is differentiable if domf is open and the gradient f(x) = exists at each x domf ( f(x), f(x),..., f(x) ) x 1 x 2 x n 1st-order condition: differentiable f with convex domain is convex iff f(y) f(x)+ f(x) T (y x) for all x,y domf f(y) f(x) + f(x) T (y x) (x,f(x)) first-order approximation of f is global underestimator A Brief Review on Convex Optimization 10

Second-order conditions f is twice differentiable if domf is open and the Hessian 2 f(x) S n, exists at each x domf 2 f(x) ij = 2 f(x) x i x j, i,j = 1,...,n, 2nd-order conditions for twice differentiable f with convex domain f is convex if and only if 2 f(x) 0 for all x domf if 2 f(x) 0 for all x domf, then f is strictly convex A Brief Review on Convex Optimization 11

Examples quadratic function: f(x) = (1/2)x T Px+q T x+r (with P S n ) f(x) = Px+q, 2 f(x) = P convex if P 0 least-squares objective: f(x) = Ax b 2 2 f(x) = 2A T (Ax b), 2 f(x) = 2A T A convex (for any A) quadratic-over-linear: f(x,y) = x 2 /y 2 2 f(x,y) = 2 y 3 [ y x ][ y x ] T 0 f(x,y) 1 0 2 2 convex for y > 0 1 y 0 2 x 0 A Brief Review on Convex Optimization 12

Epigraph and sublevel set α-sublevel set of f : R n R: C α = {x domf f(x) α} sublevel sets of convex functions are convex (converse is false) epigraph of f : R n R: epif = {(x,t) R n+1 x domf, f(x) t} epif f f is convex if and only if epif is a convex set A Brief Review on Convex Optimization 13

Jensen s inequality basic inequality: if f is convex, then for 0 θ 1, f(θx+(1 θ)y) θf(x)+(1 θ)f(y) extension: if f is convex, then f(ez) Ef(z) for any random variable z basic inequality is special case with discrete distribution prob(z = x) = θ, prob(z = y) = 1 θ A Brief Review on Convex Optimization 14

Optimization problem in standard form minimize f 0 (x) subject to f i (x) 0, i = 1,...,m h i (x) = 0, i = 1,...,p x R n is the optimization variable f 0 : R n R is the objective or cost function f i : R n R, i = 1,...,m, are the inequality constraint functions h i : R n R are the equality constraint functions optimal value: p = inf{f 0 (x) f i (x) 0, i = 1,...,m, h i (x) = 0, i = 1,...,p} p = if problem is infeasible (no x satisfies the constraints) p = if problem is unbounded below A Brief Review on Convex Optimization 15

Optimal and locally optimal points x is feasible if x domf 0 and it satisfies the constraints a feasible x is optimal if f 0 (x) = p ; X opt is the set of optimal points x is locally optimal if there is an R > 0 such that x is optimal for minimize (over z) f 0 (z) subject to f i (z) 0, i = 1,...,m, h i (z) = 0, i = 1,...,p z x 2 R examples (with n = 1, m = p = 0) f 0 (x) = 1/x, domf 0 = R ++ : p = 0, no optimal point f 0 (x) = logx, domf 0 = R ++ : p = f 0 (x) = xlogx, domf 0 = R ++ : p = 1/e, x = 1/e is optimal f 0 (x) = x 3 3x, p =, local optimum at x = 1 A Brief Review on Convex Optimization 16

Feasibility problem find x subject to f i (x) 0, i = 1,...,m h i (x) = 0, i = 1,...,p can be considered a special case of the general problem with f 0 (x) = 0: minimize 0 subject to f i (x) 0, i = 1,...,m h i (x) = 0, i = 1,...,p p = 0 if constraints are feasible; any feasible x is optimal p = if constraints are infeasible A Brief Review on Convex Optimization 17

Convex optimization problem standard form convex optimization problem minimize f 0 (x) subject to f i (x) 0, i = 1,...,m a T i x = b i, i = 1,...,p f 0, f 1,..., f m are convex; equality constraints are affine problem is quasiconvex if f 0 is quasiconvex (and f 1,..., f m convex) often written as minimize f 0 (x) subject to f i (x) 0, i = 1,...,m Ax = b important property: feasible set of a convex optimization problem is convex A Brief Review on Convex Optimization 18

example minimize f 0 (x) = x 2 1+x 2 2 subject to f 1 (x) = x 1 /(1+x 2 2) 0 h 1 (x) = (x 1 +x 2 ) 2 = 0 f 0 is convex; feasible set {(x 1,x 2 ) x 1 = x 2 0} is convex not a convex problem (according to our definition): f 1 is not convex, h 1 is not affine equivalent (but not identical) to the convex problem minimize x 2 1+x 2 2 subject to x 1 0 x 1 +x 2 = 0 A Brief Review on Convex Optimization 19

Local and global optima any locally optimal point of a convex problem is (globally) optimal proof: suppose x is locally optimal and y is optimal with f 0 (y) < f 0 (x) x locally optimal means there is an R > 0 such that z feasible, z x 2 R = f 0 (z) f 0 (x) consider z = θy +(1 θ)x with θ = R/(2 y x 2 ) y x 2 > R, so 0 < θ < 1/2 z is a convex combination of two feasible points, hence also feasible z x 2 = R/2 and f 0 (z) θf 0 (x)+(1 θ)f 0 (y) < f 0 (x) which contradicts our assumption that x is locally optimal A Brief Review on Convex Optimization 20

Optimality criterion for differentiable f 0 x is optimal if and only if it is feasible and f 0 (x) T (y x) 0 for all feasible y X x f 0 (x) if nonzero, f 0 (x) defines a supporting hyperplane to feasible set X at x A Brief Review on Convex Optimization 21

unconstrained problem: x is optimal if and only if x domf 0, f 0 (x) = 0 equality constrained problem minimize f 0 (x) subject to Ax = b x is optimal if and only if there exists a ν such that x domf 0, Ax = b, f 0 (x)+a T ν = 0 minimization over nonnegative orthant x is optimal if and only if minimize f 0 (x) subject to x 0 x domf 0, x 0, { f0 (x) i 0 x i = 0 f 0 (x) i = 0 x i > 0 A Brief Review on Convex Optimization 22

Linear program (LP) minimize c T x+d subject to Gx h Ax = b convex problem with affine objective and constraint functions feasible set is a polyhedron P x c A Brief Review on Convex Optimization 23

Examples diet problem: choose quantities x 1,..., x n of n foods one unit of food j costs c j, contains amount a ij of nutrient i healthy diet requires nutrient i in quantity at least b i to find cheapest healthy diet, minimize c T x subject to Ax b, x 0 piecewise-linear minimization equivalent to an LP minimize max i=1,...,m (a T i x+b i) minimize t subject to a T i x+b i t, i = 1,...,m A Brief Review on Convex Optimization 24

Quadratic program (QP) minimize (1/2)x T Px+q T x+r subject to Gx h Ax = b P S n +, so objective is convex quadratic minimize a convex quadratic function over a polyhedron f 0 (x ) x P A Brief Review on Convex Optimization 25

Examples least-squares minimize Ax b 2 2 analytical solution x = A b (A is pseudo-inverse) can add linear constraints, e.g., l x u linear program with random cost minimize c T x+γx T Σx = Ec T x+γvar(c T x) subject to Gx h, Ax = b c is random vector with mean c and covariance Σ hence, c T x is random variable with mean c T x and variance x T Σx γ > 0 is risk aversion parameter; controls the trade-off between expected cost and variance (risk) A Brief Review on Convex Optimization 26

Quadratically constrained quadratic program (QCQP) minimize (1/2)x T P 0 x+q T 0x+r 0 subject to (1/2)x T P i x+q T i x+r i 0, i = 1,...,m Ax = b P i S n +; objective and constraints are convex quadratic if P 1,...,P m S n ++, feasible region is intersection of m ellipsoids and an affine set A Brief Review on Convex Optimization 27

Example: Linear discrimination separate two sets of points {x 1,...,x N }, {y 1,...,y M } by a hyperplane: a T x i +b > 0, i = 1,...,N, a T y i +b < 0, i = 1,...,M homogeneous in a, b, hence equivalent to a T x i +b 1, i = 1,...,N, a T y i +b 1, i = 1,...,M a set of linear inequalities in a, b A Brief Review on Convex Optimization 28

Robust linear discrimination (Euclidean) distance between hyperplanes H 1 = {z a T z +b = 1} H 2 = {z a T z +b = 1} is dist(h 1,H 2 ) = 2/ a 2 to separate two sets of points by maximum margin, minimize (1/2) a 2 subject to a T x i +b 1, i = 1,...,N a T y i +b 1, i = 1,...,M (1) (after squaring objective) a QP in a, b A Brief Review on Convex Optimization 29

Approximate linear separation of non-separable sets an LP in a, b, u, v minimize 1 T u+1 T v subject to a T x i +b 1 u i, i = 1,...,N a T y i +b 1+v i, i = 1,...,M u 0, v 0 at optimum, u i = max{0,1 a T x i b}, v i = max{0,1+a T y i +b} can be interpreted as a heuristic for minimizing #misclassified points A Brief Review on Convex Optimization 30

Support vector classifier minimize a 2 +γ(1 T u+1 T v) subject to a T x i +b 1 u i, i = 1,...,N a T y i +b 1+v i, i = 1,...,M u 0, v 0 produces point on trade-off curve between inverse of margin 2/ a 2 and classification error, measured by total slack 1 T u+1 T v same example as previous page, with γ = 0.1: A Brief Review on Convex Optimization 31

Lagrangian standard form problem (not necessarily convex) minimize f 0 (x) subject to f i (x) 0, i = 1,...,m h i (x) = 0, i = 1,...,p variable x R n, domain D, optimal value p Lagrangian: L : R n R m R p R, with doml = D R m R p, L(x,λ,ν) = f 0 (x)+ m λ i f i (x)+ i=1 p ν i h i (x) i=1 weighted sum of objective and constraint functions λ i is Lagrange multiplier associated with f i (x) 0 ν i is Lagrange multiplier associated with h i (x) = 0 A Brief Review on Convex Optimization 32

Lagrange dual function Lagrange dual function: g : R m R p R, g(λ,ν) = inf L(x,λ,ν) x D ( = inf x D f 0 (x)+ m λ i f i (x)+ i=1 ) p ν i h i (x) i=1 g is concave, can be for some λ, ν lower bound property: if λ 0, then g(λ,ν) p proof: if x is feasible and λ 0, then f 0 ( x) L( x,λ,ν) inf L(x,λ,ν) = g(λ,ν) x D minimizing over all feasible x gives p g(λ,ν) A Brief Review on Convex Optimization 33

Least-norm solution of linear equations dual function minimize x T x subject to Ax = b Lagrangian is L(x,ν) = x T x+ν T (Ax b) to minimize L over x, set gradient equal to zero: x L(x,ν) = 2x+A T ν = 0 = x = (1/2)A T ν plug in in L to obtain g: g(ν) = L(( 1/2)A T ν,ν) = 1 4 νt AA T ν b T ν a concave function of ν lower bound property: p (1/4)ν T AA T ν b T ν for all ν A Brief Review on Convex Optimization 34

Standard form LP minimize c T x subject to Ax = b, x 0 dual function Lagrangian is L is affine in x, hence L(x,λ,ν) = c T x+ν T (Ax b) λ T x g(λ,ν) = inf x L(x,λ,ν) = = b T ν +(c+a T ν λ) T x { b T ν A T ν λ+c = 0 otherwise g is linear on affine domain {(λ,ν) A T ν λ+c = 0}, hence concave lower bound property: p b T ν if A T ν +c 0 A Brief Review on Convex Optimization 35

The dual problem Lagrange dual problem maximize g(λ, ν) subject to λ 0 finds best lower bound on p, obtained from Lagrange dual function a convex optimization problem; optimal value denoted d λ, ν are dual feasible if λ 0, (λ,ν) domg often simplified by making implicit constraint (λ, ν) dom g explicit example: standard form LP and its dual (page 35) minimize c T x subject to Ax = b x 0 maximize b T ν subject to A T ν +c 0 A Brief Review on Convex Optimization 36

Weak and strong duality weak duality: d p always holds (for convex and nonconvex problems) strong duality: d = p does not hold in general (usually) holds for convex problems conditions that guarantee strong duality in convex problems are called constraint qualifications A Brief Review on Convex Optimization 37

Slater s constraint qualification strong duality holds for a convex problem if it is strictly feasible, i.e., minimize f 0 (x) subject to f i (x) 0, i = 1,...,m Ax = b x intd : f i (x) < 0, i = 1,...,m, Ax = b also guarantees that the dual optimum is attained (if p > ) can be sharpened: e.g., can replace intd with relintd (interior relative to affine hull); linear inequalities do not need to hold with strict inequality,... there exist many other types of constraint qualifications A Brief Review on Convex Optimization 38

Inequality form LP primal problem minimize c T x subject to Ax b dual function g(λ) = inf x ( (c+a T λ) T x b T λ ) { b = T λ A T λ+c = 0 otherwise dual problem maximize b T λ subject to A T λ+c = 0, λ 0 from Slater s condition: p = d if A x b for some x in fact, p = d except when primal and dual are infeasible A Brief Review on Convex Optimization 39

primal problem (assume P S n ++) Quadratic program minimize x T Px subject to Ax b dual function g(λ) = inf x ( x T Px+λ T (Ax b) ) = 1 4 λt AP 1 A T λ b T λ dual problem maximize (1/4)λ T AP 1 A T λ b T λ subject to λ 0 from Slater s condition: p = d if A x b for some x in fact, p = d always A Brief Review on Convex Optimization 40

Complementary slackness assume strong duality holds, x is primal optimal, (λ,ν ) is dual optimal f 0 (x ) = g(λ,ν ) = inf x ( f 0 (x )+ f 0 (x ) f 0 (x)+ m λ if i (x)+ i=1 m λ if i (x )+ i=1 ) p νih i (x) i=1 p νih i (x ) i=1 hence, the two inequalities hold with equality x minimizes L(x,λ,ν ) λ i f i(x ) = 0 for i = 1,...,m (known as complementary slackness): λ i > 0 = f i (x ) = 0, f i (x ) < 0 = λ i = 0 A Brief Review on Convex Optimization 41

Karush-Kuhn-Tucker (KKT) conditions the following four conditions are called KKT conditions (for a problem with differentiable f i, h i ): 1. primal feasibility: f i (x) 0, i = 1,...,m, h i (x) = 0, i = 1,...,p 2. dual feasibility: λ 0 3. complementary slackness: λ i f i (x) = 0, i = 1,...,m 4. first order condition (gradient of Lagrangian with respect to x vanishes): x L(x,λ,ν) = f 0 (x)+ m λ i f i (x)+ i=1 p ν i h i (x) = 0 i=1 from page 41: if strong duality holds and x, λ, ν are optimal, then they must satisfy the KKT conditions A Brief Review on Convex Optimization 42

KKT conditions for convex problem if x, λ, ν satisfy KKT for a convex problem, then they are optimal: from complementary slackness: f 0 ( x) = L( x, λ, ν) from 4th condition (and convexity): g( λ, ν) = L( x, λ, ν) hence, f 0 ( x) = g( λ, ν) if Slater s condition is satisfied: x is optimal if and only if there exist λ, ν that satisfy KKT conditions recall that Slater implies strong duality, and dual optimum is attained generalizes optimality condition f 0 (x) = 0 for unconstrained problem A Brief Review on Convex Optimization 43

example: water-filling (assume α i > 0) minimize n i=1 log(x i +α i ) subject to x 0, 1 T x = 1 x is optimal iff x 0, 1 T x = 1, and there exist λ R n, ν R such that λ 0, λ i x i = 0, 1 x i +α i +λ i = ν if ν < 1/α i : λ i = 0 and x i = 1/ν α i if ν 1/α i : λ i = ν 1/α i and x i = 0 determine ν from 1 T x = n i=1 max{0,1/ν α i} = 1 interpretation n patches; level of patch i is at height α i flood area with unit amount of water resulting level is 1/ν 1/ν x i i α i A Brief Review on Convex Optimization 44

Solving unconstrained minimization minimize f(x) f convex, twice continuously differentiable (hence dom f open) we assume optimal value p = inf x f(x) is attained (and finite) unconstrained minimization methods produce sequence of points x (k) domf, k = 0,1,... with f(x (k) ) p can be interpreted as iterative methods for solving optimality condition f(x ) = 0 A Brief Review on Convex Optimization 45

Initial point and sublevel set algorithms in this chapter require a starting point x (0) such that x (0) domf sublevel set S = {x f(x) f(x (0) )} is closed 2nd condition is hard to verify, except when all sublevel sets are closed: equivalent to condition that epi f is closed true if domf = R n true if f(x) as x bddomf examples of differentiable functions with closed sublevel sets: m f(x) = log( exp(a T i x+b i )), i=1 f(x) = m log(b i a T i x) i=1 A Brief Review on Convex Optimization 46

Strong convexity and implications f is strongly convex on S if there exists an m > 0 such that 2 f(x) mi for all x S implications for x,y S, hence, S is bounded p >, and for x S, f(y) f(x)+ f(x) T (y x)+ m 2 x y 2 2 f(x) p 1 2m f(x) 2 2 useful as stopping criterion (if you know m) A Brief Review on Convex Optimization 47

Descent methods x (k+1) = x (k) +t (k) x (k) with f(x (k+1) ) < f(x (k) ) other notations: x + = x+t x, x := x+t x x is the step, or search direction; t is the step size, or step length from convexity, f(x + ) < f(x) implies f(x) T x < 0 (i.e., x is a descent direction) General descent method. given a starting point x domf. repeat 1. Determine a descent direction x. 2. Line search. Choose a step size t > 0. 3. Update. x := x + t x. until stopping criterion is satisfied. A Brief Review on Convex Optimization 48

Line search types exact line search: t = argmin t>0 f(x+t x) backtracking line search (with parameters α (0,1/2), β (0,1)) starting at t = 1, repeat t := βt until f(x+t x) < f(x)+αt f(x) T x graphical interpretation: backtrack until t t 0 f(x + t x) f(x) + t f(x) T x f(x) + αt f(x) T x t t = 0 t 0 A Brief Review on Convex Optimization 49

Gradient descent method general descent method with x = f(x) given a starting point x domf. repeat 1. x := f(x). 2. Line search. Choose step size t via exact or backtracking line search. 3. Update. x := x + t x. until stopping criterion is satisfied. stopping criterion usually of the form f(x) 2 ǫ convergence result: for strongly convex f, f(x (k) ) p c k (f(x (0) ) p ) c (0,1) depends on m, x (0), line search type very simple, but often very slow; rarely used in practice A Brief Review on Convex Optimization 50

quadratic problem in R 2 f(x) = (1/2)(x 2 1 +γx 2 2) (γ > 0) with exact line search, starting at x (0) = (γ,1): x (k) 1 = γ very slow if γ 1 or γ 1 example for γ = 10: ( ) k γ 1, x (k) 2 = γ +1 ( γ 1 ) k γ +1 4 x2 0 x (1) x (0) 4 10 0 10 x 1 A Brief Review on Convex Optimization 51

nonquadratic example f(x 1,x 2 ) = e x 1+3x 2 0.1 +e x 1 3x 2 0.1 +e x 1 0.1 x (0) x (2) (0) x x (1) x (1) backtracking line search exact line search A Brief Review on Convex Optimization 52

a problem in R 100 f(x) = c T x 500 i=1 log(b i a T i x) 10 4 f(x (k) ) p 10 2 10 0 10 2 exact l.s. k backtracking l.s. 10 4 0 50 100 150 200 linear convergence, i.e., a straight line on a semilog plot A Brief Review on Convex Optimization 53