Interior Point Methods: Second-Order Cone Programming and Semidefinite Programming

School of Mathematics T H E U N I V E R S I T Y O H F E D I N B U R G Interior Point Methods: Second-Order Cone Programming and Semidefinite Programming Jacek Gondzio Email: J.Gondzio@ed.ac.uk URL: http://www.maths.ed.ac.uk/~gondzio Paris, January 2018 1

Outline Self-concordant Barriers Second-Order Cone Programming example cones example SOCP problems logarithmic barrier function IPM for SOCP Semidefinite Programming background (linear matrix inequalities) duality example SDP problems logarithmic barrier function IPM for SDP Final Comments Paris, January 2018 2

Self-concordant Barrier Def: Let C R n be an open nonempty convex set. Let f : C R be a 3 times continuously diff able convex function. A function f is called self-concordant if there exists a constant p > 0 such that 3 f(x)[h,h,h] 2p 1/2 ( 2 f(x)[h,h]) 3/2, x C, h : x+h C. (Wethensaythatf isp-self-concordant). Note that a self-concordant function is always well approximated by the quadratic model because the error of such an approximation can be bounded by the 3/2 power of 2 f(x)[h,h]. Paris, January 2018 3

Self-concordant Barrier Lemma The barrier function logx is self-concordant on R +. Proof: Consider f(x) = logx and compute f (x) = x 1, f (x) = x 2 and f (x) = 2x 3 and check that the self-concordance condition is satisfied for p = 1. Lemma The barrier function 1/x α, with α (0, ) is not self-concordant on R +. Lemma The barrier function e 1/x is not self-concordant on R +. Use self-concordant barriers in optimization Paris, January 2018 4

Second-Order Cone Programming (SOCP) Paris, January 2018 5

SOCP: Second-Order Cone Programming Generalization of QP. Deals with conic constraints. Solved with IPMs. Numerous applications: quadratically constrained quadratic programs, problems involving sums and maxima/minima of norms, SOC-representable functions and sets, matrix-fractional problems, problems with hiperbolic constraints, robust LP/QP, robust least-squares. Paris, January 2018 6

SOCP: Second-Order Cone Programming This lecture is based on three papers: M. Lobo, L. Vandenberghe, S. Boyd and H. Lebret, Applications of Second-Order Cone Programming, Linear Algebra and its Appls 284 (1998) pp. 193-228. L. Vandenberghe and S. Boyd, Semidefinite Programming, SIAM Review 38 (1996) pp. 49-95. E.D. Andersen, C. Roos and T. Terlaky, On Implementing a Primal-Dual IPM for Conic Optimization, Mathematical Programming 95 (2003) pp. 249-273. Paris, January 2018 7

Cones: Background Def. A set K R n is called a cone if for any x K and for any λ 0, λx K. Convex Cone: x 1 x 3 x 2 Example: K = {x R n : x 2 1 n j=2 x 2 j, x 1 0}. Paris, January 2018 8

Example: Three Cones R + : R + = {x R : x 0}. Quadratic Cone: K q = {x R n : x 2 1 n j=2 x 2 j, x 1 0}. Rotated Quadratic Cone: K r = {x R n : 2x 1 x 2 n j=3 x 2 j, x 1,x 2 0}. Paris, January 2018 9

Matrix Representation of Cones Each of the three most common cones has a matrix representation using orthogonal matrices T and/or Q. (Orthogonal matrix: Q T Q = I). Quadratic Cone K q. Define 1 1 Q = and write: 1... 1 K q = {x R n : x T Qx 0, x 1 0}. Example: x 2 1 x2 2 +x2 3 + +x2 n. Paris, January 2018 10

Matrix Representation of Cones (cont d) Rotated Quadratic Cone K r. Define 0 1 1 0 Q = 1... 1 and write: K r = {x R n : x T Qx 0, x 1,x 2 0}. Example: 2x 1 x 2 x 2 3 +x2 4 + +x2 n. Paris, January 2018 11

Matrix Representation of Cones (cont d) Consider a linear transformation T : R 2 R 2 : T 2 = 1 2 1 2 1 2 1 2. It corresponds to a rotation by π/4. Indeed, write: [ [ ] z u y] = T 2 v that is to get z = u+v 2, y = u v 2 2yz = u 2 v 2. Paris, January 2018 12

Matrix Representation of Cones (cont d) Now, define T = 1 2 1 2 1 2 1 2 1 and observe that the rotated quadratic cone satisfies... 1 Tx K r iff x K q. Paris, January 2018 13

Example: Conic constraint Consider a constraint: 1 2 x 2 +a T x b. Observe that g(x) = 1 2 xt x+a T x b is convexhence the constraint defines a convex set. The constraint may be reformulated as an intersection of an affine (linear) constraint and a quadratic one: a T x+z = b y = 1 x 2 2yz, y,z 0. Paris, January 2018 14

Example: Conic constraint (cont d) Now, substitute: to get z = u+v 2, y = u v 2 a T x+ u+v 2 = b u v = 2 x 2 +v 2 u 2. Paris, January 2018 15

Dual Cone Let K R n be a cone. Def. The set: K := {s R n : s T x 0, x K} is called the dual cone. Def. The set: K P := {s R n : s T x 0, x K} is called the polar cone (Fig below). K K P Paris, January 2018 16

Conic Optimization Consider an optimization problem: where K is a convex closed cone. min c T x s.t. Ax = b, x K, We assume that K = K 1 K 2 K k, thatis,conek isaproductofseveralindividualconeseachofwhich is one of the three cones defined earlier. Paris, January 2018 17

Primal and Dual SOCPs Consider a primal SOCP where K is a convex closed cone. min c T x s.t. Ax = b, x K, The associated dual SOCP max b T y s.t. A T y +s = c, s K. Weak Duality: If (x,y,s) is a primal-dual feasible solution, then c T x b T y = x T s 0. Paris, January 2018 18

IPM for Conic Optimization Conic Optimization problems can be solved in polynomial time with IPMs. Consider a quadratic cone K q = {(x,t) : x R n 1,t R, t 2 x 2, t 0}, and define the (convex) logarithmic barrier function for this cone f : R n R { f(x,t) = ln(t 2 x 2 ) if x < t + otherwise. Paris, January 2018 19

Logarithmic Barrier Fctn for Quadratic Cone Its derivatives are given by: and 2 f(x,t)= f(x,t) = 2 (t 2 x T x) 2 2 t 2 x T x[ x t ], [ (t 2 x T x)i+2xx T ] 2tx 2tx T t 2 +x T x. Theorem: f(x,t) is a self-concordant barrier on K q. Exercise: Prove it in case n = 2. Paris, January 2018 20

Examples of SOCP LP, QP use the cone R + (positive orthant). SDP uses the cone SR n n + (symmetric positive definite matrices). SOCP uses two quadratic cones K q and K r. Quadratically Constrained Quadratic Programming (QCQP) is a particular example of SOCP. Typical trick to replace a quadratic constraint as a conic one!!! Consider a constraint: 1 2 x 2 +a T x b. Rewrite it as: x 2 +v 2 u 2. Paris, January 2018 21

QCQP and SOCP LetP i R n n beasymmetricpositivedefinitematrixandq i R n. Define a quadratic function f i (x) = x T P i x + 2q T i x + r i and an associated ellipsoid E i = {x f i (x) 0}. The set of constraints f i (x) 0, i = 1,2,...,m defines an intersection of (convex) ellipsoids and of course defines a convex set. The optimization problem min f 0 (x) s.t. f i (x) 0, i = 1,2,...,m, is an example of quadratically constrained quadratic program(qcqp). QCQP can be reformulated as SOCP. QCQP can be also reformulated as SDP. Paris, January 2018 22

SOCP Example: Linear Regression The least squares solution of a linear system of equations Ax = b is the solution of the following optimization problem and it can be recast as: min x Ax b min t s.t. Ax b t. Paris, January 2018 23

Ellipsoids: Background Sphere with (0, 0) centre: x 2 1 +x2 2 1 Ellipsoid, centre at (0,0), radii a,b: x 2 1 a 2 + x2 2 b 2 1 Ellipsoid, centre at (p,q), radii a,b: (x 1 p) 2 a 2 + (x 2 q) 2 General ellipsoid: b 2 1 (x x 0 ) T H(x x 0 ) 1, where H is a positive definite matrix. Let H = LL T. Then we can rewrite the ellipsoid as L T (x x 0 ) 1. Paris, January 2018 24

Ellipsoids are everywhere Obélix Gérard Depardieu as Obélix Paris, January 2018 25

SOCP Example: Robust LP Consider an LP: min c T x s.t. a T i x b i, i = 1,2,...,m, and assume that the values of a i are uncertain. Suppose that a i E i, i = 1,2,...,m, where E i are given ellipsoids E i = {ā i +P i u : u 1}, where P i is a symmetric positive definite matrix. Paris, January 2018 26

SOCP Example: Robust LP (cont d) Observe that a T i x b i a i E i iff ā T i x+ P ix b i, because for any x R n max{a T x : a E} = ā T x+max{u T Px : u 1} = ā T x+ Px. Hence robust LP formulated as SOCP is: min c T x s.t. ā T i x+ P ix b i, i = 1,2,...,m. Paris, January 2018 27

SOCP Example: Robust QP Consider a QP with uncertain objective: min x max P E xt Px+2q T x+r subject to linear constraints. Uncertain symmetric positive definite matrix P belongs to the ellipsoid: m P E = {P 0 + P i u i : u 1}, i=1 where P i are symmetric positive semidefinite matrices. The definition of ellipsoid E implies that m max P E xt Px = x T P 0 x+ max (x T P i x)u i. u 1 Paris, January 2018 28 i=1

SOCP Example: Robust QP (cont d) From Cauchy-Schwartz inequality: m m (x T P i x)u i (x T P i x) 2 hence i=1 max u 1 i=1 m (x T P i x)u i i=1 We get a reformulation of robust QP: 1/2 m (x T P i x) 2 i=1 u 1/2. min x x T P 0 x+ ( mi=1 (x T P i x) 2) 1/2 +2q T x+r. Paris, January 2018 29

SOCP Example: Robust QP (cont d) This problem can be written as: min t+v +2q T x+r s.t. z t, x T P 0 x v, x T P i x z i, i = 1,...,m. SOCP reformulation: min t+v +2q T x+r s.t. z t, (2P 1/2 i x,z i 1) z i +1, z i 0, i = 1..m, (2P 1/2 0 x,v 1) v +1, v 0. Paris, January 2018 30

Semidefinite Programming (SDP) Paris, January 2018 31

SDP: Semidefinite Programming Generalization of LP. Deals with symmetric positive semidefinite matrices (Linear Matrix Inequalities, LMI). Solved with IPMs. Numerous applications: eigenvalue optimization problems, quasi-convex programs, convex quadratically constrained optimization, robust mathematical programming, matrix norm minimization, combinatorial optimization (provides good relaxations), control theory, statistics. Paris, January 2018 32

SDP: Semidefinite Programming This lecture is based on two survey papers: L. Vandenberghe and S. Boyd, Semidefinite Programming, SIAM Review 38 (1996) pp. 49-95. M.J. Todd, Semidefinite Optimization, Acta Numerica 10 (2001) pp. 515-560. Paris, January 2018 33

SDP: Background Def. A matrix H R n n is positive semidefinite if x T Hx 0 for any x 0. We write H 0. Def. A matrix H R n n is positive definite if x T Hx > 0 for any x 0. We write H 0. We denote with SR n n (SR n n + ) the set of symmetric and symmetric positive semidefinite matrices. Let U,V SR n n. We define the inner product between U and V as U V = trace(u T V), where trace(h) = n i=1 h ii. The associated norm is the Frobenius norm, written U F = (U U) 1/2 (or just U ). Paris, January 2018 34

Linear Matrix Inequalities Def. Linear Matrix Inequalities Let U,V SR n n. We write U V iff U V 0. We write U V iff U V 0. We write U V iff U V 0. We write U V iff U V 0. Paris, January 2018 35

Properties 1. If P R m n and Q R n m, then trace(pq) = trace(qp). 2. IfU,V SR n n,andq R n n isorthogonal(i.e. Q T Q = I), then U V = (Q T UQ) (Q T VQ). More generally, if P is nonsingular, then U V = (PUP T ) (P T VP 1 ). 3. Every U SR n n can be written as U= QΛQ T, where Q is orthogonal and Λ is diagonal. Then UQ = QΛ. In other words the columns of Q are the eigenvectors, and the diagonal entries of Λ the corresponding eigenvalues of U. 4. If U SR n n and U = QΛQ T, then trace(u) = trace(λ) = i λ i. Paris, January 2018 36

Properties (cont d) 5. For U SR n n, the following are equivalent: (i) U 0 (U 0) (ii) x T Ux 0, x R n (x T Ux>0, 0 x R n ). (iii) If U = QΛQ T, then Λ 0 (Λ 0). (iv) U = P T P for some matrix P (U = P T P for some square nonsingular matrix P). 6. Every U SR n n has a square root U 1/2 SR n n. Proof: From Property 5 (ii) we get U = QΛQ T. Take U 1/2 = QΛ 1/2 Q T, where Λ 1/2 is the diagonal matrix whose diagonal contains the (nonnegative) square roots of the eigenvalues of U, and verify that U 1/2 U 1/2 = U. Paris, January 2018 37

Properties (cont d) 7. Suppose U = [ A B T B C where A and C are symmetric and A 0. Then U 0 (U 0) iff C BA 1 B T 0 ( 0). The matrix C BA 1 B T is called the Schur complement of A in U. Proof: follows easily from the factorization: [ A B T B C ] = [ I 0 BA 1 I ], ][ ][ A 0 I A 1 B T 0 C BA 1 B T 0 I ]. 8. If U SR n n and x R n, then x T Ux = U xx T. Paris, January 2018 38

Primal-Dual Pair of SDPs Primal Dual min C X max b T y s.t. A i X =b i, i = 1..m s.t. mi=1 y i A i +S =C, X 0; S 0, where A i SR n n, b R m, C SR n n are given; and X,S SR n n, y R m are the variables. Paris, January 2018 39

Theorem: Weak Duality in SDP If X is feasible in the primal and (y,s) in the dual, then Proof: C X b T y = X S 0. m C X b T y = ( y i A i +S) X b T y = i=1 m (A i X)y i +S X b T y i=1 = S X = X S. Further, since X is positive semidefinite, it has a square root X 1/2 (Property 6), and so X S = trace(xs)=trace(x 1/2 X 1/2 S)=trace(X 1/2 SX 1/2 ) 0. We use Property 1 and the fact that S and X 1/2 are positive semidefinite,hencex 1/2 SX 1/2 ispositivesemidefiniteanditstrace is nonnegative. Paris, January 2018 40

SDP Example 1: Minimize the Max. Eigenvalue We wish to choose x R k to minimize the maximum eigenvalue of A(x)=A 0 +x 1 A 1 +...+x k A k, where A i R n n and A i = A T i. Observe that λ max (A(x)) t if and only if λ max (A(x) ti) 0 λ min (ti A(x)) 0. This holds iff ti A(x) 0. So we get the SDP in the dual form: max t s.t. ti A(x) 0, where the variable is y := (t,x). Paris, January 2018 41

SDP Example 2: Logarithmic Chebyshev Approx. Suppose we wish to solve Ax b approximately, where A = [a 1...a n ] T R n k and b R n. InChebyshevapproximationweminimizethel -normoftheresidual, i.e., we solve min max a T i i x b i. This can be cast as an LP, with x and an auxiliary variable t: min t s.t. t a T i x b i t, i = 1..n. In some applications b i has a dimension of a power of intensity, and it is typically expressed on a logarithmic scale. In such cases the more natural optimization problem is min max i (assuming a T i x > 0 and b i > 0). log(a T i x) log(b i) Paris, January 2018 42

Logarithmic Chebyshev Approximation (cont d) The logarithmic Chebyshev approximation problem can be cast as a semidefinite program. To see this, note that log(a T i x) log(b i) = logmax(a T i x/b i, b i /a T i x). Hence the problem can be rewritten as the following (nonlinear) program min t or, min t s.t. s.t. 1/t a T i x/b i t, i = 1..n. t at i x/b i 0 0 0 a T i x/b i 1 0 1 t which is a semidefinite program. 0, i = 1..n Paris, January 2018 43

Logarithmic Barrier Function Define the logarithmic barrier function for the cone SR n n + of positive definite matrices. f : SR+ n n R f(x) = Let us evaluate its derivatives. Let X 0,H SR n n. Then { lndetx if X 0 + otherwise. f(x +αh) = lndet[x(i +αx 1 H)] = lndetx ln(1+αtrace(x 1 H)+O(α 2 )) = f(x) αx 1 H +O(α 2 ), so that f (X) = X 1 and Df(X)[H] = X 1 H. Paris, January 2018 44

Logarithmic Barrier Function (cont d) Similarly f (X +αh) = [X(I +αx 1 H)] 1 so that f (X)[H] = X 1 HX 1 and D 2 f(x)[h,g] = X 1 HX 1 G. = [I αx 1 H +O(α 2 )]X 1 = f (X)+αX 1 HX 1 +O(α 2 ), Finally, f (X)[H,G] = X 1 HX 1 GX 1 X 1 GX 1 HX 1. Paris, January 2018 45

Logarithmic Barrier Function (cont d) Theorem: f(x) = lndetx is a convex barrier for SR n n +. Proof: Define φ(α) = f(x+αh). We know that f is convex if, for every X SR n n + and every H SR n n, φ(α) is convex in α. Consider a set of α such that X+αH 0. On this set φ (α) = D 2 f( X)[H,H] = X 1 H X 1 H, where X = X+αH. Since X 0, so is V = X 1/2 (Property 6), and φ (α) = V 2 HV 2 H = trace(v 2 HV 2 H) = trace((vhv)(vhv)) = VHV 2 F 0. So φ is convex. When X 0 approaches a singular matrix, its determinant approaches zero and f(x). Paris, January 2018 46

Simplified Notation Define A : SR n n R m AX = (A i X) m i=1 Rm. Note that, for any X SR n n and y R m, m m (AX) T y = (A i X)y i = ( y i A i ) X, i=1 i=1 so the adjoint of A is given by m A y = y i A i. i=1 A is a mapping from R m to SR n n. Paris, January 2018 47

Simplified Notation (cont d) With this notation the primal SDP becomes min C X s.t. AX = b, X 0, where X SR n n is the variable. The associated dual SDP writes max b T y s.t. A y + S = C S 0, where y R m and S SR n n are the variables. Paris, January 2018 48

Solving SDPs with IPMs Replace the primal SDP min C X s.t. AX = b, X 0, with the primal barrier SDP min C X +µf(x) s.t. AX = b, (with a barrier parameter µ 0). Formulate the Lagrangian L(X,y,S) = C X +µf(x) y T (AX b), with y R m, and write the first order conditions (FOC) for a stationary point of L: C +µf (X) A y = 0. Paris, January 2018 49

Solving SDPs with IPMs (cont d) Use f(x) = lndet(x) and f (X) = X 1. Therefore the FOC become: C µx 1 A y = 0. Denote S = µx 1, i.e., XS = µi. For a positive definite matrix X its inverse is also positive definite. The FOC now become: with X 0 and S 0. AX = b, A y +S = C, XS = µi, Paris, January 2018 50

Newton direction We derive the Newton direction for the system: AX = b, A y +S = C, µx 1 +S = 0. Recall that the variables in FOC are (X,y,S), where X,S SR n n + and y R m. Hence we look for a direction ( X, y, S), where X, S SR n n + and y R m. Paris, January 2018 51

Newton direction (cont d) The differentiation in the above system is a nontrivial operation. The direction is the solution of the system: [ ] [ ] A 0 0 X ξb 0 A I y = ξ C. µ(x 1 X 1 ) 0 I S ξ µ We introduce a useful notation P Q for n n matrices P and Q. This is an operator from SR n n to SR n n defined by (P Q)U = 1 2 (PUQT +QUP T ). Paris, January 2018 52

Logarithmic Barrier Function for the cone SR n n + of positive definite matrices, f : SR n n + R { lndetx if X 0 f(x) = + otherwise. LP: Replace x 0 with µ n j=1 lnx j. SDP: Replace X 0 with µ n j=1 lnλ j = µln( n j=1 λ j ). Nesterov and Nemirovskii, Interior Point Polynomial Algorithms in Convex Programming: Theory and Applications, SIAM, Philadelphia, 1994. Lemma Thebarrierfunctionf(X)isself-concordantonSR n n +. Paris, January 2018 53

Interior Point Methods: Logarithmic barrier functions for SDP and SOCP Self-concordant barriers polynomial complexity (predictable behaviour) Unified view of optimization from LP via QP to NLP, SDP, SOCP Efficiency good for SOCP problematic for SDP because solving the problem of size n involves linear algebra operations in dimension n 2 and this requires n 6 flops! Use IPMs in your research! Paris, January 2018 54