INTERIOR-POINT METHODS ROBERT J. VANDERBEI JOINT WORK WITH H. YURTTAN BENSON REAL-WORLD EXAMPLES BY J.O. COLEMAN, NAVAL RESEARCH LAB

1 INTERIOR-POINT METHODS FOR SECOND-ORDER-CONE AND SEMIDEFINITE PROGRAMMING ROBERT J. VANDERBEI JOINT WORK WITH H. YURTTAN BENSON REAL-WORLD EXAMPLES BY J.O. COLEMAN, NAVAL RESEARCH LAB

Outline 2 Introduction and Review of Problem Classes: LP, NLP, SOCP, SDP Formulating SOCPs as Smooth Convex NLPs Formulating SDPs as Smooth Convex NLPs Applications and Computational Results

Introduction and Review of Problem Classes: LP, NLP, SOCP, SDP 3

Traditional Families of Optimization Problems 4 Linear Programming (LP) minimize subject to c T x Ax = b x 0. Smooth Convex Nonlinear Programming (NLP) We assume that minimize f(x) subject to h i (x) = 0, i E, h i (x) 0, i I. h i s in equality constraints are affine; h i s in inequality constraints are concave; f is convex; All are twice continuously differentiable.

Two New Families of Optimization Problems 5 Second-Order Cone Programming (SOCP) minimize f T x subject to A i x + b i c T i x + d i, i = 1,..., m, Here, A i is a k i n matrix, b i is a k i -vector, c i is an n-vector, and d i is a scalar. Semidefinite Programming (SDP) minimize f(x) subject to h i (X) = 0, i E, X 0. Here, X is an n n symmetric matrix, h i is affine, and X 0 means that X is to be positive semidefinite.

Preview of Applications 6 Second-Order Cone Programming FIR Filter Design Antenna Array Optimization Structural Optimization Grasping Problems Semidefinite Programming SDP relaxations of combinatorial optimization problems such as MAX- CUT have been shown to be better than LP relaxations Eigenvalue optimization: maximize the minimum eigenvalue More later on some of these.

Interior-Point Algorithms 7 Interior-point methods were first developed in the mid 80 s for LP. Later they were extended to NLP, SOCP, and SDP. Extension to NLP follows closely the LP case. That is, is treated the same in both cases. The nonnegative-orthant cone, x 0, plays a fundamental role. For SOCP, a different cone is introduced, the Lorentz cone, and algorithms are derived using this cone in place of the nonnegative orthant cone. Similarly for SDP, a different cone, the cone of positive semidefinite matrices, plays a fundamental role in algorithm development.

Our Aim A Large Enclosing Superclass 8 Recently, SOCP and SDP have been unified under the banner of Conic Programming and software has appeared to solve problems from the union of the SOCP and SDP problem classes. The aim of our work is to show that SOCP and SDP can be included under the banner of NLP and solved with generic NLP software. This is a much more general setting.

Formulating SOCPs as Smooth Convex NLPs 9

SOCP as NLP 10 For SOCP, h i (x) = c T i x + d i A i x + b i is concave but not differentiable on {x : A i x + b i = 0}. Nondifferentiability should not be a problem unless it happens at optimality...

An Example 11 minimize ax 1 + x 2 ( 1 < a < 1) subject to x 1 x 2, Clearly, (x 1, x 2 ) = (0, 0). Dual feasibility: [ a 1 ] + [ d x1 dx 1 1 ] y = 0. An interior-point method must pick the correct value for d x 1 when x dx 1 = 0: 1 d x 1 dx = a. 1 x1 =0 Not possible a priori.

Smooth Alternative Formulations 12 Constraint formulation: φ(a i x + b i, c T i x + d i) 0, where φ(u, t) = t u. Not differentiable at u = 0. Smooth alternatives: i = 1,..., m t ɛ 2 + i u2 i 0, concave not equiv. t 2 u 2 0, t 0 nonconcave equiv. t u 2 /t 0, t > 0 concave equiv. strict interior Third suggestion is due to E.D. Andersen

Formulating SDPs as Smooth Convex NLPs 13

Formulating SDPs as NLPs 14 How to express minimize f(x) subject to h i (X) = 0, i E, X 0. as minimize f(x) subject to h i (x) = 0, i E, h i (x) 0, i I.

Characterizations of Semidefiniteness 15 The Definition of SDP Semiinfinite LP ξ T Xξ 0 ξ R n. Involves an infinite number of constraints. Smallest Eigenvalue Nonsmooth Convex NLP λ min (X) 0. The smallest eigenvalue as a function of the matrix X is concave, but it is not smooth as the following example shows: [ ] x1 y X =. y x 2 Eigenvalues are: λ 1 (X) = λ 2 (X) = ((x 1 + x 2 ) ) (x 1 x 2 ) 2 + 4y 2 /2 ((x 1 + x 2 ) + ) (x 1 x 2 ) 2 + 4y 2 /2.

Characterizations of Semidefiniteness Continued 16 Factorization Smooth Convex NLP For j = 1,..., n, let { Djj, where X = LDL d j (X) = T, X 0,, else. then X 0 if and only if d j (X) 0 j. Theorem 1. The functions d j are (1) concave, (2) continuous over the positive semidefinite matrices, and (3) infinitely differentiable over the positive definite matrices.

First and Second Derivatives Explicit Formulae 17 Fix j and partition X: X = j j Z y y T x. Assume that X 0. Then Z is nonsingular and d j (X) = x y T Z 1 y. Derivatives of d j are easily obtained from derivatives of And they are... f(y, Z) = y T Z 1 y.

Derivatives (Fletcher 85) 18 The derivatives of f(y, Z) are: f y i = y T Z 1 e i + e T i Z 1 y, 2 f y i y j = e T j Z 1 e i + e T i Z 1 e j, f z ij = y T Z 1 e i e T j Z 1 y, 2 f z ij z kl = y T Z 1 e k e T l Z 1 e i e T j Z 1 y + y T Z 1 e i e T j Z 1 e k e T l Z 1 y, 2 f y i z kl = y T Z 1 e k e T l Z 1 e i e T i Z 1 e k e T l Z 1 y. From these it is easy to check that f is convex and hence that d j is concave.

LOQO A Specific Interior-Point Algorithm for NLP 19 The problem: minimize f(x) subject to h i (x) = 0, i E, h i (x) 0, i I. is represented internally by LOQO as: minimize f(x) subject to h i (x) w i = 0, i E, h i (x) w i = 0, i I, x j g j + t j = 0, j = 1,..., n, w i + p i = 0, i E, w i 0, i E I, p i 0, i E, g j 0, j = 1,..., n, t j 0, j = 1,..., n.

Complementarity 20 Each nonnegative variable has a complementary dual variable: w i y i, p i q i, g i z i, t i s i. Using LOQO to solve SDPs LOQO preserves positivity of each nonnegative variable. It does not preserve positivity of h i. In fact, without further precaution, h i does indeed go negative. This is bad for SDP!

Step Shortening 21 Steps lengths can be shortened to guarantee that h i s remain positive. But then the algorithm could jam. Next theorem shows that simple jamming does not occur. To be precise, we consider a point ( x, w, ȳ, p, q, ḡ, z, t, s) satisfying the following conditions: (1) Nonnegativity: w 0, ȳ 0, p 0, q 0, ḡ 0, z 0, t 0, and s 0. (2) Complementary variable positivity: w + ȳ > 0, p + q > 0, ḡ + z > 0, and t + s > 0. (3) Free variable positivity: ḡ + t > 0.

Nonjamming Theorem 22 We are interested in a point where some of the w i s vanish. Let U = {i : w i = 0} and let B denote the other indices. Write A = ( h T ) T (and other matrices and vectors) in block form according to this partition [ ] B A = U Matrix A and other quantities are functions of the current point. We use the same letter with a bar over it to denote the value of these objects at the point ( x,..., s). Theorem 2. If the point ( x,..., s) satisfies conditions (1) (3), and Ū has full row rank, then the step direction w for w has a continuous extension to this point and w U = µy 1 U e U > 0 there.

An Example 23 We used LOQO to solve the following simple problem: minimize 3 j=1 X jj subject to X 12 = 1 X 13 = 1.5 X 23 = 2 X 0. Without step shortening, LOQO fails. With step shortening, it solves in less than 20 iterations.

Applications and Computational Results 24

Finite Impulse Response (FIR) Filter Design Low Pass Filter minimize ρ 25 where H(ν) = subject to 19 k= 5 ( 1 L L ( 1 (H(ν) 1) 2 dν) 1/2 ρ H h(k)e 2πikν, H H 2 (ν)dν) 1/2 ρ h(k) = Complex filter coefficients, i.e., decision variables

Specific Example 26 Ref: J.O. Coleman and D.P. Scholnik, U.S. Naval Research Laboratory, constraints 1880 variables 1648 time (iterations) LOQO 17.7 (33) SEDUMI(Sturm) 46.1 (18) MWSCAS99 paper available: engr.umbc.edu/ jeffc/pubs/abstracts/mwscas99socp.html Click here for an animation.

FIR Filter Design Woofer, Midrange, Tweeter minimize ρ 27 subject to where 1 0 (H w (ν) + H m (ν) + H t (ν) 1) 2 dν ɛ ( 1 W ( 1 M ( 1 H i (ν) = h i (0) + 2 T n 1 k=1 W M T H 2 w (ν)dν ) 1/2 ρ W = [.2,.8] H 2 m (ν)dν ) 1/2 ρ M = [.4,.6] [.9 H 2 t (ν)dν ) 1/2 ρ T = [.7,.3] h i (k) cos(2πkν), h i (k) = filter coefficients, i.e., decision variables i = w, m, t

29 Specific Example: Pink Floyd s Money filter length: n = 14 integral discretization: N = 1000 constraints 4 variables 43 time (secs) LOQO 79 MINOS 164 LANCELOT 3401 SNOPT 35 Ref: J.O. Coleman, U.S. Naval Research Laboratory, CISS98 paper available: engr.umbc.edu/ jeffc/pubs/abstracts/ciss98.html Click here for a demo

Wide-Band Antenna Array Weight Design SOCP 30 minimize subject to α (µ,ν) S A(µ, ν) 2 dµdν α, A(µ, ν) 10 25/20, (µ, ν) S where A(µ, ν) = k A(µ, ν) 10 45/20, (µ, ν) S 0 ν P A(µ m, ν) β m 2 dν 10 50/10, m β m = M k n c kn 2 ρ c kn e 2πi(kµ+nν) n m = 1,..., M c kn = complex-valued design weight for array element k at freq tap n S = subset of direction/freq pairs representing sidelobe S 0 = subset of sidelobe spelling NRL µ m = finite set of directions covering pass band ρ = Noise bound

Specific Example 31 15 antennae in a linear array 21 taps on each array 671 Chebychev constraints to spell NRL constraints 6230 variables 624 time (iterations) LOQO 1049 (48) SEDUMI(Sturm) 573 (27) Ref: D.P. Scholnik and J.O. Coleman, U.S. Naval Research Laboratory, RADAR2000 paper available: engr.umbc.edu/ jeffc/pubs/papers/radar2kds/radar2000.html

Max-Cut Problem SDP 33 minimize C X e T CDiag(X) subject to X jj = 1/4, j = 1,..., n, X 0. where C is a symmetric incidence matrix; i.e., C ij = 1 if {i, j} is an arc and zero otherwise. Specific example: C is a 15 15 arc incidence matrix for a randomly generated graph with 1.5 arcs/node on average. LOQO solves this problem in 27 iterations and 43.0 seconds.

Max-Min Eigenvalue SDP 34 minimize C X subject to X kk = 1, k = 1,..., n, X 0. Specific example: C is a 10 10 random symmetric matrix. LOQO solves the problem in 27 iterations and 43.1 seconds.

Exploiting Sparsity 35 The computational results shown for SDP don t properly exploit sparsity. For SDP, one way to exploit sparsity is to solve the dual problem: maximize subject to k b k y k A (k) ij y k Z ij = C ij, k Z 0. Typically, C and each of the A (k) are sparse matrices. i, j = 1, 2,..., n The equality constraints then imply that Z must have a sparsity pattern that is the union of the sparsity patterns of these other matrices. Therefore, Z is often sparse too, and its sparsity pattern is known from the beginning. This sparsity is easy to exploit, although we haven t done that yet. Also, derivative calculatioins require Z 1 too.