Introduction to Semidefinite Programs

Introduction to Semidefinite Programs Masakazu Kojima, Tokyo Institute of Technology Semidefinite Programming and Its Application January, 2006 Institute for Mathematical Sciences National University of Singapore Jan 9-13, 2006

Main purpose Introduction of semidefinite programs Brief reviw of SDPs Contents Part I: Introduction to SDP and its basic theory 70 minutes Part II: Primal-dual interior-point methods 70 minutes Part III: Some applications Appendix: Linear optimization problems over symmetric cones References Not comprehensive but helpful for further study of the subject

Contents Part I: Basic theory. 1. LP versus SDP 2. Why is SDP interesting and important? 3. The equality standard form SDP 4. Some basic properties on positive semidefinite matrices and their inner product 5. General SDPs 6. Some examples 7. Duality Part II: Primal-dual interior-point methods 1. Existing numerical methods for SDPs 2. Three approaches to primal-dual interior-point methods for SDPs 3. The central trajectory 4. Search directions 5. Various primal-dual interior-point methods 6. Exploiting sparsity 7. Software packages 8. SDPA sparse format 9. Numerical results

Part III: Some applications. 1. Matrix approximation problems 2. A nonconvex quadratic optimization problem 3. The max-cut problem 4. Sum of squares of polynomials Appendix: Linear Optimization Problems over Symmetric Cones. 1. Linear optimization problems over cones 2. Symmetric cones 3. Euclidean Jordan algebra 4. The equality standard form SOCP (Second Order Cone Program) 5. Some applications of SOCPs

Part I: Basic theory 1. LP versus SDP 2. Why is SDP interesting and important? 3. The equality standard form 4. Some basic properties on positive semidefinite matrices and their inner product 5. General SDPs 6. Some examples 7. Duality 1

SDP is an extension of LP to the space of symmetric matrices. LP: minimize X 11 2X 12 5X 22 subject to 2X 11 + 3X 12 + X 22 = 7, X 11 + X 12 1, X 11 0, X 12 0, X 22 0. SDP: minimize X 11 2X 12 5X 22 subject to 2X 11 + 3X 12 + X 22 = 7, X 11 + X 12 1, X ( 11 0, X ) 12 0, X 22 0, X11 X 12 O (positive semidefinite). X 12 X 22 In addition to real-valued equality and ( inequality ) constraints, SDP has X11 X a positive semidefinite constraint in 12 or equivalently X 12 X 22 X 11 0, X 22 0, X 11 X 22 X 2 12 0, which requires X 11, X 12, X 22 dependent nonlinearly, while X 11 0, X 12 0, X 22 0 in LP are linear and separable. The feasible region of LP and the feasible region of SDP are convex sets, but the former is polyhedral while the latter is non-polyhedral. 3

Lots of Applications to Various Problems Systems and control theory Linear Matrix Inequality [6] SDP relaxations of combinatorial and nonconvex problems Max cut and max clique problems [14] 0-1 integer linear programs [24] Polynomial optimization problems [22, 35] Robust optimization [4] Quantum chemistry [51] Moment problems (applied probability) [5, 23]... Survey articles Todd [39],Vandenberghe-Boyd [45] Handbook of SDP Wolkowicz-Saigal-Vandenberghe [46] Web pages Helmberg[15], Wolkowicz [47] 5

Theory Self-concordant theory [33] Euclidean Jordan algebra [10, 36] Polynomial-time primal-dual interior-point methods [1, 17, 20, 27, 34] SDP serves as a core convex optimization problem 6

Classification of Optimization Problems!! Convex! Continuous Discrete!!! Nonconvex Linear Optimization Problem 0-1 Integer over Symmetric Cone! LP & QOP! SemiDefinite Program! Second Order Cone Program! Convex Quadratic Optimization Problem! Linear Program!! relaxation!! "!! "! Polynomial Optimization Problem! POP over Symmetric Cone! Bilinear Matrix Inequality

(LP) minimize a 0 x subject to a p x = b p (p = 1,..., m), R n x 0. Here R : the set (linear space) of real numbers, R n : the linear space of n dim. vectors, a p R n : data, n dim. vector (p = 0, 1, 2..., m), b p R : data, real number (p = 1, 2..., m), x R n : variable, n dim. vector, a p x = n i=1 [a p] i x i (the inner product of a p and x). 9

(LP) minimize a 0 x subject to a p x = b p (p = 1,..., m), R n x 0. (SDP) minimize A 0 X subject to A p X = b p (p = 1,..., m), S n X O. Here S n : the linear space of n n symmetric matrices, A p S n : data, n n symmetric matrix (p = 0, 1, 2..., m), b p R : data, real number (p = 1, 2..., m), X S n : n n variable, symmetric matrix; X 11 X 12... X 1n X = (X ij ) = X 21 X 22... X 2n...... Sn, X n1 X n2... X nn X ij = X ji R (1 i j n), X O X S n is positive semidefinite, A p X = n n i=1 j=1 [A p] ij X ij (the inner product of A p and X). Notation R k l : the linear space of k l matrices. 10

(LP) minimize a 0 x subject to a p x = b p (p = 1,..., m), R n x 0. (SDP) minimize A 0 X subject to A p X = b p (p = 1,..., m), S n X O. m = ( 2, n = 2, b ) 1 = 7, b 2 ( = 9, ) X11 X X = 12 1 1, A X 21 X 0 =, ( 22 ) ( 1 5 ) 2 1.5 2 0.5 A 1 =, A 1.5 1 2 =. 0.5 3 Note that X 12 = X 21. Example minimize X 11 2X 12 5X 22 subject to 2X ( 11 + 3X ) 12 + X 22 = 7, 2X 11 + X 12 + 3X 22 = 9, X11 X 12 O. X 12 X 22 11

(SDP) minimize A 0 X subject to A p X = b p (p = 1,..., m), S n X O. S n X O : semidefinite constraint. Definition: X O if u T Xu = n i=1 n j=1 X iju i u j 0 for u R n. Definition: X O if u T Xu = n i=1 n j=1 X iju i u j > 0 for u 0. X S n all n eigenvalues are real. X O ( O) all n eigenvalues are nonnegative (positive). X O ( O) all the principal minors are nonnegative (positive). X O ( O) all diagonal X ii s are nonnegative (positive). X O and X ii = 0 X ij = 0 ( j). 13

(SDP) minimize A 0 X subject to A p X = b p (p = 1,..., m), S n X O. S n X O : semidefinite constraint. X O ( O) n n (nonsingular) B; X = BB T (factorization). X O n n lower triang. L; X = LL T (Cholesky factorization). X O n n orthogonal P and n n diagonal D; X = P DP T (orthogonal decompostion). Here each diagonal element λ i = D ii of D is an eigenvalue of X and each ith column p i of P an eigenvector corresponding to λ i. X O C S n + ; X = C2 Take C = P (D) 1/2 P T ; ( ) ( ) C 2 = P (D) 1/2 P T P (D) 1/2 P T = P DP T = X. ( X ) 2. We will write X = 14

(SDP) minimize A 0 X subject to A p X = b p (p = 1,..., m), S n X O. S n : a linear space with dimension n(n + 1)/2. X + Y S n for X S n and Y S n. αx S n for α R and X S n. linear independence. a basis consisting of n(n + 1)/2. Example. n = 2. Note that X 12 = X 21. ( ) ( ) 1.1 0.5 2.4 0.6 2 + 0.5 = 0.5 2.4 0.6 1.2 ( ) ( ) ( ) 1 0 0 1 0 0,, : a basis of S 2. 0 0 1 0 0 1 ( 3.4 0.7 0.7 5.4 ), For every A, X S n, the inner product A X is defined; A X = ( n n ) i=1 j=1 A ijx ij = ( n n ) i=1 j=1 A ijx ji (i, i)th element of AX u T Xu = trace u T Xu = trace Xuu T = X uu T = trace AX. 15

(SDP) minimize A 0 X subject to A p X = b p (p = 1,..., m), S n X O. S n X O and the inner product X Y. S n + (Sn + ) { Y S n : Y X 0 for X S n +}. (Proof) Let Y S n +. For X O, Y X = trace Y X = trace Y LL T = trace L T Y L 0. (since S n L T Y L O) S n + (Sn + ). Hence S n + = (Sn + ) (self-dual). (Proof) Suppose that Y S n +. Then u Rn ; 0 > u T Y u = Y (uu T ). Hence Y (S n + ). 16

(SDP) minimize A 0 X subject to A p X = b p (p = 1,..., m), S n X O. S n X O and the inner product X Y. X, Y O and X Y = 0 XY = O. (Proof) Apply the eigenvalue decomposition X = P DP T. Then 0 = X Y = trace XY = trace P DP T Y = trace DP T Y P = n i=1 D ( ii P T Y P ) ii, D ii 0, ( P T Y P ) { ii 0 Dii = 0 or i, ( P T Y P ) ii = 0; the ith row of ( P T Y P ) = 0. Therefore DP T Y P = O, which implies XY = P DP T Y P P T = O. 17

(SDP) minimize A 0 X subject to A p X = b p (p = 1,..., m), S n X O. Common properties on R n + {x Rn : x 0} and S n + {X Sn : X O}. R n + is a cone; αx Rn + if α 0 and x Rn +. R n + is convex; λx + (1 λ)x Rn + if 0 λ 1 and x, y Rn +. R n + is self-dual; (Rn + ) { } y R n : y x 0 for x R n + = R n x, y R n + and x y = 0 = x iy i = 0 (i = 1,..., n). S n + is a cone; αx Sn + if α 0 and X Sn +. S n + is convex; λx + (1 λ)y Sn + if 0 λ 1 and X, Y Sn +. S n + is self-dual; (Sn + ) { } Y S n : Y X 0 for X S n + = S n X, Y S n + and X Y = 0 = XY = O. +. +. R n + and Sn + are special cases of symmetric cones discussed in Appendix. 18

Equality standard form: (SDP) minimize A 0 X subject to A p X = b p (p = 1,..., m), S n X O. A 1 p O O X 1 O O O A 2 A p p O......, X O X 2 O...... Sn1 + +n t. O O A t O O X t p (p = 0, 1,..., m) Equality standard form with multiple matrix variables: (SDP) minimize t q=1 Aq 0 Xq subject to t q=1 Aq p Xq = b p (p = 1,..., m), S nq X q O (q = 1,..., t). If n q = 1 (q = 1,..., t), (SDP) is equivalent to the equality standard form of LP, where A q p R and Xq R. Can we transform the above (SDP) (or the equality standard form of LP) into Equality standard form (SDP)? 20

Equality standard form: (SDP) minimize A 0 X subject to A p X = b p (p = 1,..., m), S n X O.? An SDP from systems and control theory (SDP) minimize subject to λ ( XA + A T X + C T C XB + C T ) D B T X + D T C D T D I X λi. λi, Here X S n and λ R are variables, and A, B, C, D are given data matrices. Can we transform the (SDP) into Equality standard form (SDP)? Yes in theory, but not practical at all. Transform (SDP) into an LMI standard form (with equality constraints), which corresponds to the dual of an equality standard form with free variables. 21

A general SDP: minimize a linear function in x 1,..., x k and X q (q = 1,..., t) subject to linear equalities in x 1,..., x k and X q (q = 1,..., t), linear (matrix) inequalities in x 1,..., x k and X q (q = 1,..., t), x 1,..., x k R (free real variables), X q O (q = 1,..., t) (positive semidefinite matrix variables). Here a nonnegative x i is regarded as a 1 1 psd matrix variable, and a matrix variable U R k m a set of free variables U ij s. Any real-valued linear function in X S n can be written as n n A X = A ij X ij for A S n. i=1 j=1 Here A ij denote the (i, j)th element of A S n, and X ij the (i, j)th element of X S n. We can transform any SDP into Equality standard form. But such a transformation is neither trivial nor practical in many cases. Usually it is easier to reduce an SDP to an LMI standard form with equality constraints than to Equality standard form. 22

A general SDP: minimize a linear function in x 1,..., x k and X q (q = 1,..., t) subject to linear equalities in x 1,..., x k and X q (q = 1,..., t), linear (matrix) inequalities in x 1,..., x k and X q (q = 1,..., t), x 1,..., x k R (free real variables), X q O (q = 1,..., t) (positive semidefinite matrix variables). Here a nonnegative x i is regarded as a 1 1 psd matrix variable, and a matrix variable U R k m a set of free variables U ij s. Reduction to an LMI standard form with equality constraints. Represent each symmetric variable X q S nq as a linear combination of a basis E q ij (1 i j nq ) such that X q = E q ij yq ij, 1 i j n q where y q ij denotes a free real variable and Eq ij an nq n q matrix with 1 at the (i, j)th and (j, i)th elements and 0 elsewhere. Then substitute it into the general SDP. 23

A general SDP: minimize a linear function in x 1,..., x k and X q (q = 1,..., t) subject to linear equalities in x 1,..., x k and X q (q = 1,..., t), linear (matrix) inequalities in x 1,..., x k and X q (q = 1,..., t), x 1,..., x k R (free real variables), X q O (q = 1,..., t) (positive semidefinite matrix variables). Here a nonnegative x i is regarded as a 1 1 psd matrix variable, and a matrix variable U R k m a set of free variables U ij s. An LMI standard form with equality constraints : minimize a linear function in y 1,..., y l subject to linear equalities in y 1,..., y l, linear (matrix) inequalities in y 1,..., y l, y 1,..., y l R (free real variables). Take the dual we have an equality standard form with free variables. We can apply existing software packages such as CSDP, PENON, SDPA, SDPT3 and SeDuMi to this primal-dual pair. 24

Example minimize w + ( 2 1 1 3 ) X subject to X 2 1 2 1 w O. X 2 1 X 0 0 + O 0 0 w + 2 1 w 0 0 0 0 0 1 ( ) ( ) ( ) 1 0 0 1 0 0 X = y 0 0 1 + y 1 0 2 + y 0 1 3. X 0 1 0 0 0 1 0 0 = 0 0 0 y 1 + 1 0 0 y 2 + 0 0 0 0 0 0 0 0 0 O 2 1 2 1 0 0 0 0 0 1 0 0 0 0. y 3. minimize w + 2y 1 + 2y 2 + 3y 3 1 0 0 0 1 0 subject to 0 0 0 y 1 + 1 0 0 0 0 0 0 0 0 y 2 + 0 0 0 0 1 0 0 0 0 y 3 + O 0 0 0 0 1 w + O 2 1 2 1 0 O. 25

Eigenvalues of a symmetric matrix A the max. eigenvalue = min {λ : λi A} = min {λ : λi A O}. the min. eigenvalue = max {λ : A λi O}. We can formulate many engineering problems involving eigenvalues of symmetric matrices via SDPs. A Linear Matrix inequality (LMI) A( ) O, where A( ) is a linear mapping in matrix and/or vector variables can be formulated in For example, A(X) = maximize λ subject to A( ) λi O. ( XA + A T X + C T C XB + C T ) D B T X + D T C D T D I O. For LMIs, see [6] S. Boyd, L. El Ghaui, E. Feron and V. Balakrishnan, Linear Matrix Inequalities in System and Control Theory, SIAM, Philadelphia, 1994. 27

The Schur complement. Let A S k, positive definite, X R k l, Y S l. Then Y X T A 1 X O quadratic in X ( ) A X X T O. Y linear in X Proof: ( ) ( ) ( ) ( ) I A 1 X T A X I A 1 X A O O I X T = Y O I O Y X T A 1 X ( ) I A 1 X is nonsingular. O I ( ) ( ) A X A O X T O Y O Y X T A 1 O. X ( A X X T Y ) A is positive definite. O Y X T A 1 X O. 28

The Schur complement. Let A S k, positive definite, X R k l, Y S l. Then Y X T A 1 X O quadratic in X ( ) A X X T O. Y linear in X When A = I, Y X T X O ( I X X T Y ) O. When A = I, X = x R k and Y = y R, ( ) I x y x T x 0 x T y O. When A = Iy, X = x R k and Y = y R, y x T x 0 y 2 x T x 0 and y 0 (SOCP constraint) (y x T x/y 0 if y > 0) ( Iy x x T y ) O. 29

A quasi-convex optimization problem. minimize (Lx c)t (Lx c) d T x subject to Ax b. Here L R k n, c R k, d R n, A R l n, b R l, and d T x > 0 for feasible x R n. minimize ζ subject to ζ (Lx c)t (Lx c) d T, Ax b. x ζ (Lx ( c)t (Lx c) (d T ) x)i Lx c d T 0 x (Lx c) T O. ζ ( d T ) xi Lx c SDP: minimize ζ subject to (Lx c) T O, Ax b. ζx SOCP 30

Equality standard form SDP: min A 0 X sub.to A p X = b p (p = 1,..., m), S n X O. Lagrangian function: L(X, y, S) = A 0 X m (A p X b p ) y p S X p=1 for X S n, y = (y 1,..., y m ) T R m, S n S O. Lagrangian dual: max y R m, S O min X S n L(X, y, S) = max y R m, S O {L(X, y, S) : XL(X, y, S) = O} { = max y R m, S O L(X, y, S) : A 0 } m p=1 A py p S = O { = max y R m, S O L(X, y, S) : S = A 0 } m p=1 A py p { m = max y R m, S O p=1 b py p : S = A 0 } m p=1 A py p (since L(X, y, A 0 m p=1 A py p ) = m p=1 b py p ) { m = max p=1 b py p : S = A 0 } m p=1 A py p, y R m, S O (Dual of Equality standard form). 32

A primal-dual pair of LPs (P) min. a 0 x sub.to a p x = b p (p = 1,..., m), R n x 0. (D) max. m p=1 b py p sub.to m p=1 a py p + s = a 0, R n s 0. Weak duality LP : x s = a 0 x m b p y p 0 for feasible x, y, s. j=1 SDP : X S = A 0 X m b p y p 0 for feasible X, y, S. j=1 (Proof) Let (X, y, S) be a feasible solution of (P) and (D). Then ( 0 X S = S X = A 0 ) m p=1 A py p X = A 0 X m p=1 (A p X)y p = A 0 X m j=1 b py p. A primal-dual pair of SDPs (P) min. A 0 X sub.to A p X = b p (p = 1,..., m), S n X O. (D) max. m p=1 b py p sub.to m p=1 A py p + S = A 0, S n S O. 33

A primal-dual pair of LPs (P) min. a 0 x sub.to a p x = b p (p = 1,..., m), R n x 0. (D) max. m p=1 b py p sub.to m p=1 a py p + s = a 0, R n s 0. Strong duality: If feasible (x, y, s) (x 0, y 0) then m LP : x s = a 0 x b p ȳ p = 0 at optimal ( x, ȳ, s). j=1 If interior feasible (X, y, S) (X O, S O) then m SDP : X S = A 0 X b p ȳ p = 0 at optimal ( X, ȳ, S). j=1 For the strong duality, int. feasible (X, y, S) (X O, S O) is necessary! an example, next A primal-dual pair of SDPs (P) min. A 0 X sub.to A p X = b p (p = 1,..., m), S n X O. (D) max. m p=1 b py p sub.to m p=1 A py p + S = A 0, S n S O. 34

Example [45]: interior feasible (X, y, S) (X O, S O) is necessary! 0 0 0 (P) min 0 0 0 X 0 0 1 1 0 0 0 1 0 sub.to 0 0 0 X = 0, 1 0 0 X = 2, X O. 0 0 0 0 0 2 or (P) min X 33 sub.to X 11 = 0, X 12 + X 21 + 2X 33 = 2, X O. X: a feas.sol. X 11 = X 12 = X 21 = 0 and X 33 = 1; the opt.val. = 1. 1 0 0 0 1 0 0 0 0 (D) max 2y 2 sub.to 0 0 0 y 1 + 1 0 0 y 2 0 0 0. 0 0 0 0 0 2 0 0 1 or (D) min 2y 2 sub.to y 1 y 2 0 y 2 0 0 0 0 1 2y 2 O. (y 1, y 2 ) is feasible y 1 0, y 2 = 0; the opt.value = 0. 35

A primal-dual pair of SDPs (P) min. A 0 X sub.to A p X = b p (p = 1,..., m), S n X O. (D) max. m p=1 b py p sub.to m p=1 A py p + S = A 0, S n S O. The KKT optimality condition A p X = b p (p = 1,..., m), m p=1 A py p + S = A 0, S n X O, S n S O, XS = O (complementarity). O = XS = SX X and S are commutative; hence orthogonal P R n n ; P T XP = diag (λ 1,..., λ n ), P T SP = diag (ν 1,..., ν n ) O = XS = P T XP P T SP = diag (λ 1,..., λ n )diag (ν 1,..., ν n ), P T (X + S) P = diag (λ 1,..., λ n ) + diag (ν 1,..., ν n ). λ i 0, ν i 0, λ i ν i = 0 (i = 1, 2,..., n) (complementarity), X + S O λ i + ν i > 0 (i = 1, 2,..., n) (strict complementarity). LP: x i 0, s i 0, x i s i = 0 ( i) (comp.), x i + s i > 0 ( i) (strict comp.) Hence λ i x i, ν i s i. 36

An equality standard form (P) min. A 0 X sub.to A p X = b p (p = 1,..., m), S n X O. An equality standard form with free variables (P) min. A 0 X + d T 0 z sub.to A p X + d T p z = b p (p = 1,..., m), S n X O, z R l (a free vector variable). Here d p R l (p = 0, 1,..., m). dual An LMI standard form with equality constraints m (D) max. b p y p sub.to p=1 m A p y p + S = A 0, S n S O, p=1 m d p y p = d 0. p=1 An LMI standard form (D) max. m p=1 b py p sub.to m p=1 A py p + S = A 0, S n S O. 37

Part II: Primal-dual interior-point methods 1. Existing numerical methods for SDPs 2. Three approaches to primal-dual interior-point methods for SDPs 3. The central trajectory 4. Search directions 5. Various primal-dual interior-point methods 6. Exploiting sparsity 7. Software packages 8. SDPA sparse format 9. Numerical results 1

Some existing numerical methods for SDPs IPMs (Interior-point methods) Primal-dual scaling, CSDP(Borchers[7]), SDPA(Fujisawa-K-Nakata- Yamashita[49]), SDPT3(Toh-Todd-Tutuncu[42]), SeDuMi(Sturm[37]) Dual scaling, DSDP(Benson-Ye-Zhang[3] ) Nonlinear programming approaches Spectral bundle method(helmberg-rendl[17]) Gradient-based log-barrier method(burer-monteiro[9]) PENON(Kocvara [19]) Generalized augmented Lagrangian Saddle point mirror-prox algorithm (Lu-Nemirovski-Monteiro[26]) Medium scale SDPs (e.g. n, m 5000) and high accuracy. Large scale SDPs (e.g., n, m 10, 000) and low accuracy. Parallel implementation: SDPA SDPARA(Y-F-K[49]), SDPARA-C(N-Y-F-K[31]) DSDP PDSDP(Benson[2]) CSDP Borchers-Young[8] Spectral bundle method Nayakkankuppam[32] 2

Three approaches to primal-dual interior-point methods for SDPs (1) Self-concordance. [33] Yu. E. Nesterov and A. Nemirovski, Interior Point Polynomial Algorithms in Convex Programming, SIAM Publication, Philadelphia, 1994. [34] Yu. E. Nesterov and M. J. Todd, Primal-dual interior-point methods for self-scaled cones, SIAM Journal on Optimization, 8 (1998) 324-364. (2) Linear optimization problems over symmetric cones [10] L. Faybusovich, Linear systems in Jordan algebra and primal-dual interior-point algorithms, Journal of Computational and Applied Mathematics, 86 (1997) 149-75. [36] S. Schmieta and F. Alizadeh, Associative and Jordan algebras, and polynomial time interior-point algorithms for symmetric cones, Mathematics of Operations Research, 26 (2001) 543-564. = Appendix. (3) Extensions of primal-dual interior-point algorithms for LPs to SDPs [1, 17, 20, 27] = Our approach in this lecture. 4

SDP: P min A 0 X s.t. A p X = b p (p = 1,..., m), X S n +. D max m p=1 b py p s.t. m p=1 A py p + S = A 0, S S n +. Trace the central trajectory an opt. sol. in the primal-dual space. (x(µ), y(µ), s(µ)) (X(µ), y(µ),s(µ)) 0 (x*, y*, s*) µ 0 µ (X*, y*,s*) How do we define the central trajectory? How do we numerically trace the central trajectory? LP: P min a 0 x s.t. a p x = b p (p = 1,..., m), x R n +. D max m p=1 b py p s.t. m p=1 a py p + s = a 0, s R n +. 6

SDP: P min A 0 X s.t. A p X = b p (p = 1,..., m), X S n +. D max m p=1 b py p s.t. m p=1 A py p + S = A 0, S S n +. X the interior of S n + {X Sn : X O} det X > 0. X the boundary of S n + det X = 0. A logarithmic barrier to be away from the boundary log det X. A primal-dual pair of SDPs with logarithmic barrier terms, µ > 0. P(µ) min A 0 X µ log det X s.t. A p X = b p (p = 1,..., m), X O. D(µ) max m p=1 b py p + µ log det S s.t. m p=1 A py p + S = A 0, S O. A primal-dual pair of LPs with logarithmic barrier terms, µ > 0. P(µ) min a 0 x µ m i=1 log x i s.t. a p x = b p (p = 1,..., m), x > 0. D(µ) max m p=1 b py p + µ m i=1 log s i s.t. m p=1 a py p + s = a 0, s > 0. A logarithmic barrier to be away from the boundary m i=1 log x i. x the boundary of R n + x i = 0 (i = 1,..., n). x the interior of R n + {x Rn : x 0} x i > 0 (i = 1,..., n). LP: P min a 0 x s.t. a p x = b p (p = 1,..., m), x R n +. D max m p=1 b py p s.t. m p=1 a py p + s = a 0, s R n +. 7

A primal-dual pair of LPs with logarithmic barrier terms, µ > 0. P(µ) min a 0 x µ m i=1 log x i s.t. a p x = b p (p = 1,..., m), x > 0. D(µ) max m p=1 b py p + µ m i=1 log s i s.t. m p=1 a py p + s = a 0, s > 0. For every µ > 0, (P(µ),D(µ)) has a unique opt.sol. (x(µ), y(µ), s(µ)), which converges an optimal solution of (P,D). (x(µ), y(µ), s(µ)) 0 µ (x*, y*,s*) C = {(x(µ), y(µ), s(µ)) : µ > 0} : the central trajectory. (x(µ), y(µ), s(µ)) C { ap x = b p ( p), x > 0, m p=1 a py p + s = a 0, s > 0, x i s i = µ ( i). 8

A primal-dual pair of SDPs with logarithmic barrier terms, µ > 0. P(µ) min A 0 X µ log det X s.t. A p X = b p (p = 1,..., m), X O. D(µ) max m p=1 b py p + µ log det S s.t. m p=1 A py p + S = A 0, S O. For every µ > 0, (P(µ),D(µ)) has a unique opt.sol. (X(µ), y(µ), S(µ)), which converges an optimal solution of (P,D). (X(µ), y(µ),s(µ)) 0 µ (X*, y*,s*) C = {(X(µ), y(µ), S(µ)) : µ > 0} : the central trajectory. (X(µ), y(µ), S(µ)) C { Ap X = b p ( p), X 0, m p=1 A py p + S = A 0, S 0, XS = µi. 9

Some properties of the central trajectory C Suppose that (X, y, S) C or { Ap X = b p ( p), m p=1 A py p + S = A 0, X 0, S 0, XS = µi for µ > 0. X and S are commutative; hence orthogonal P R n n ; P T XP = diag (λ 1,..., λ n ), P T SP = diag (ν 1,..., ν n ). µi = P T µip = P T XP P T SP = diag (λ 1,..., λ n )diag (ν 1,..., ν n ); λ i > 0, ν i > 0, λ i ν i = µ (i = 1,..., n). As µ 0, (X, y, S) an optimal sol. and λ i ν i 0 (i = 1,..., n). In the LP case: If (x, y, s) is on the central trajectory then x i > 0, s i > 0, x i s i = µ (i = 1,..., n). As µ 0, (x, y, s) an optimal sol. and x i s i 0 (i = 1,..., n). Hence λ i x i and ν i s i. 10

Some properties of the central trajectory C Suppose that (X, y, S) C or { Ap X = b p ( p), m p=1 A py p + S = A 0, X 0, S 0, XS = µi for µ > 0. Factorize X such that X = X X. Then µi = X 1 (µi) X = X 1 (XS) X = XS X ( all eigenvalues of XS X S n + are µ). This relation will be used to define some neighborhoods of C. { N 2 (β) = (X, y, S) : feasible, } F XS X µi βµ, where µ = X S/n. { ( XS N (β) = (X, y, S) : feasible, λ ) min X (1 β)µ, where µ = X S/n. }. 11

SDP case (X(µ), y(µ), S(µ)) C { Ap X = b p ( p), X 0, m p=1 A py p + S = A 0, S 0, XS = µi. (a continuation toward an opt.sol. as µ 0) One iteration: Suppose that (X k, y k, S k ) is the current iterate satisfying (A p X k = b p ( p), m p=1 A pyp k + Sk = A 0, ) X k 0, S k 0. (in theory but not in practice) Choose β [0, 1] and let ˆµ = βx k S k /n, which determines the target point (X(ˆµ), y(ˆµ), S(ˆµ)) on C satisfying ( ) XS = ˆµI, A p X = b p ( p), m p=1 A py p + S = A 0. Compute a search ( Newton ) direction (dx, dy, ds). Choose step lengths α p > 0 and α d > 0 (α p = α d in theory) such that X k+1 = X k + α p dx 0, y k+1 = y k + α d dy, S k+1 = S k + α d ds 0. (X *, y *,S * ) (X k+1, y k+1,s k+1 ) 0 µ (X(µ), y(µ),s(µ)) (dx,dy,ds) ˆµ (X k, y k,s k ) 13

SDP case (X(µ), y(µ), S(µ)) C { Ap X = b p ( p), X 0, m p=1 A py p + S = A 0, S 0, XS = µi. (a continuation toward an opt.sol. as µ 0) One iteration: Suppose that (X k, y k, S k ) is the current iterate satisfying (A p X k = b p ( p), m p=1 A pyp k + Sk = A 0, ) X k 0, S k 0. (in theory but not in practice) Choose β [0, 1] and let ˆµ = βx k S k /n, which determines the target point (X(ˆµ), y(ˆµ), S(ˆµ)) on C satisfying ( ) XS = ˆµI, A p X = b p ( p), m p=1 A py p + S = A 0. Compute a search ( Newton ) direction (dx, dy, ds). Substitue (X, y, S) = (X k, y k, S k ) + (dx, dy, ds) into ( ). Solve (X k + dx)(s k + ds) = ˆµI X k S k + dxs k + X k ds = ˆµI, linearize A p (X k + dx) = b p ( p), m p=1 A p(y k p + dy p) + (S k + ds) = A 0 in (dx, dy, ds). No solution (dx, dy, ds); dx S n, ds S n We need some modification. 14

Some difficulty in the linearized system in (dx, dy, ds): (X k + dx)(s k + ds) = ˆµI X k S k + dxs k + X k ds = ˆµI, linearize A p (X k + dx) = b p ( p), m p=1 A p(y k p + dy p) + (S k + ds) = A 0 No solution (dx, dy, ds); dx S n, ds S n. the total # of equations > the total # of variables. In fact: X k S k + dxs k + X k ds = ˆµI n 2 eqs. dx S n n(n + 1)/2 vars. A p (X k + dx) = b p ( p) m eqs. dy m vars. m p=1 A p(yp k + dy p) + (S k + ds) = A 0 n(n + 1)/2 eqs. ds n(n + 1)/2 vars. Total: 3n 2 /2 + n/2 + m eqs. > n 2 + n + m vars. Two ways to resolve this difficulty. (i) Solve the linearized system in (dx, dy, ds) with dx R n n, ds S n, then symmetrize dx R n n such that (dx + dx T )/2 HKM direction [17, 20, 27], practically efficient. (ii) Scale and symmetrize X k S k + dxs k + X k ds = ˆµI to reduce the n 2 eqs. to n(n + 1)/2 eqs theoretically rich, including various directions such as HKM, NT, AHO directions. MZ family [28] 16

Some difficulty in the linearized system in (dx, dy, ds): (X k + dx)(s k + ds) = ˆµI X k S k + dxs k + X k ds = ˆµI, linearize A p (X k + dx) = b p ( p), m p=1 A p(y k p + dy p) + (S k + ds) = A 0 (i) Solve the linearized system in (dx, dy, ds) with dx R n n, ds S n, then symmetrize dx R n n such that (dx + dx T )/2 HKM direction [17, 20, 27], practically efficient. Bdy = r (the Schur complement eq.), ds = D m p=1 A pdy p, dx = X k ( (X k ) 1 K ds ) (S k ) 1, dx = ( dx + dx T )/2, where B pq = X k A p (S k ) 1 A q (p = 1,..., m, q = 1,..., m), r p = b p A p X k A p ( (X k ) 1 K D ) (S k ) 1 (p = 1,..., m), D = A 0 m p=1 A py k p Sk, K = ˆµI X k S k. B is symmetric and positive definite; fully dense in general. Efficient computation of B pq with the use of sparsity of A p s is a key. To solve Bdy = r, use the Cholesky factorization (or the CG method). 17

Some difficulty in the linearized system in (dx, dy, ds): (X k + dx)(s k + ds) = ˆµI X k S k + dxs k + X k ds = ˆµI, linearize A p (X k + dx) = b p ( p), m p=1 A p(y k p + dy p) + (S k + ds) = A 0 (ii) Scale and symmetrize X k S k + dxs k + X k ds = ˆµI to reduce the n 2 eqs. to n(n + 1)/2 eqs theoretically rich, including various directions such as HKM, NT, AHO directions. MZ family [28] For every P R n n (a nonsingular scaling matrix) and B R n n, let Then replace H P (B) = P BP 1 + (P BP 1 ) T. 2 X k S k + dxs k + X k ds = ˆµI by H P ( X k S k + dxs k + X k ds) = ˆµI in the linearized system. P = I AHO. P = X 1/2 HKM, poly.complexity. ( ( P = X 1/2 X 1/2 SX 1/2) 1/2 1/2 X 1/2) NT, poly.complexity. 18

Typical primal-dual interior-point methods (in theory) (a) Path-following (b) Mizuno-Todd-Ye Predictor-corrector (c) Potential reduction not stated, see [20, 43] (d) Homogeneous self-dual embedding not stated, see [26, 18] Main issues to be discussed: Polynomial-time convergence for (a), (b), (c) and (d) Local convergence for (b) Some additional issues to be taken account: (e) Search directions NT [34], HKM [17, 20, 27] (f) Feasible starting points or infeasible starting points Primal-dual interior-point methods in practice later 20

(a) Path-following primal-dual interior-point methods {(X k, y k, S k )} in a neighborhood N of the central trajectory. A step length ᾱ = max{α : (X k, y k, S k ) + α(dx, dy, ds) N}; (X k+1, y k+1, S k+1 ) = (X k, y k, S k ) + α(dx, dy, ds). Neighborhood? A target ˆµ to choose (dx, dy, ds) towards (X(ˆµ), y(ˆµ), S(ˆµ))? Short-step algorithms [20, 27, 34] { N 2 (τ ) = (X, y, S) : feasible, XS X µi F τ µ, where µ = X S/n. ˆµ = (1 γ/ n)x k S k /n. Here τ, γ (0, 1), e.g., τ = γ = 0.5. }, Short-step : O ( ( )) n X 0 S 0 log ɛ iterations to attain X k S k ɛ 22

(a) Path-following primal-dual interior-point methods {(X k, y k, S k )} in a neighborhood N of the central trajectory. A step length ᾱ = max{α : (X k, y k, S k ) + α(dx, dy, ds) N}; (X k+1, y k+1, S k+1 ) = (X k, y k, S k ) + α(dx, dy, ds). Neighborhood? A target ˆµ to choose (dx, dy, ds) towards (X(ˆµ), y(ˆµ), S(ˆµ))? Long-step (Large-step) algorithms [28] { ( XS N (τ ) = (X, y, S) : feasible, λ ) min X where µ = X S/n. ˆµ = σx k S k /n. Here τ (0, 1), σ (0, 1), e.g., τ = σ = 0.5. (1 τ )µ, ( ( )) ( ( )) X 0 S 0 X 0 O n log (NT) or O n 1.5 S 0 log (HKM) iterations ɛ ɛ }, 23

Comparison Short-step algorithms { N 2 (τ ) = (X, y, S) : feasible, XS X µi F τ µ, where µ = X S/n. ˆµ = (1 γ/ n)x k S k /n. Here τ, γ (0, 1), e.g., τ = γ = 0.5. Note that N 2 (0) = the central trajectory, N 2 (1) the feasible region. Long-step (Large-step) algorithms { ( XS N (τ ) = (X, y, S) : feasible, λ ) min X where µ = X S/n. ˆµ = σx k S k /n. Here τ (0, 1), σ (0, 1), e.g., τ = σ = 0.5. Note that } (1 τ )µ, N (0) = the central trajectory, N (1) = the feasible region. ˆµ = (1 γ/ n)x k S k /n > > ˆµ = σx k S k /n., }, 24

Comparison of the two neighborhoods in detail Short-step (X, y, S) N 2 (τ ) = { (X, y, S) : feasible, XS X µi F τ µ, where µ = X S/n } λ i (i = 1,..., n) : the eigenvalues of XS X n (λ i µ) 2 τ µ τ µ (λ i µ) τ µ (i = 1,..., n) i=1 Long-step (X, y, S) N (τ ) = { ( XS X ) (X, y, S) : feasible, λ min where µ = X S/n. λ i (i = 1,..., n) : the eigenvalues of XS X; λ min ( XS X ) = min i λ i (1 τ )µ λ i or τ µ (λ i µ) (i = 1,..., n) (1 τ )µ, } 25

(b) Mizuno-Todd-Ye Predictor-corrector An outer neighborhood for a predictor step with ˆµ = 0 An inner neighborhood for a corrector step with ˆµ = X S/n (i) Not only polynomial-time but also superlinear linear convergence. (ii) Some additional assumption on the strict complementarity and nondegeneracy of the unique optimal solution (X, y, S ): X + S O. (iii) Inner and outer neighborhoods which enforce (X k, y k, S k ) to converge (X, y, S ) tangentially to the central trajectory [21]. In LP case [50]: (ii) & (iii) are unnecessary. (i) Quadratic convergence. 26

A primal-dual pair of SDPs (P) min A 0 X s.t. A p X = b p (p = 1,..., m), X S n +. (D) max m p=1 b py p s.t. m p=1 A py p + S = A 0, S S n +. Primal-dual interior-point methods in practice (i) An infeasible starting point (X 0, y 0, S 0 ) such that X 0, S 0 O. (ii) (X k, y k, S k ); X k, S k O, Ap X k b p 0, A p y k p + Sk A 0 0. (iii) Step size control: Let γ (0, 1), e.g., γ = 0.80 0.98, α p = γ max{α [0, 1] : X k + αdx O}, α d = γ max{α [0, 1] : S k + αds O}, X k+1 = X k + α p + dx, (y k+1, S k+1 ) = (y k, S k ) + α d (dy, ds) (Conservative compared to the LP case). (iv) Mehrotra s predictor-corrector A reasonable choice of the target point (X(βµ k ), y(βµ k ), S(βµ k )) on the central trajectory, where µ k = X k S k /n. A 2nd-order correction in computing the search direction (dx, dy, ds). 27

P min A 0 X s.t. A p X = b p (p = 1,..., m), X S n +. D max m p=1 b py p s.t. m p=1 A py p + S = A 0, S S n +. Important features in practice large-scale SDPs arise often. n n matrix variables X, S S n, each of which involves n(n + 1)/2 real variables; for example, n = 2000 n(n + 1)/2 2 million. m linear equality constraints in P or m A p s S n. Data matrices A p (p = 1,..., m) are sparse! 21 benchmark problems with n 500 from SDPLIB [53] the ratio of nonzero elements in A p s 10 2 10 4 10 6 10 8 the number of problems 7 11 3 Exploit sparsity and structured sparsity Enormous computational power parallel computation 29

P min A 0 X s.t. A p X = b p (p = 1,..., m), X S n +. D max m p=1 b py p s.t. m p=1 A py p + S = A 0, S S n +. Computation of a search direction (dx, dy, ds) Bdy = r (the Schur complement eq.), ds = D m p=1 A pdy p, dx = X k ( (X k ) 1 K ds ) (S k ) 1, dx = ( dx + dx T )/2, where B pq = X k A p (S k ) 1 A q (p = 1,..., m, q = 1,..., m), r p = b p A p X k A p ( (X k ) 1 K D ) (S k ) 1 (p = 1,..., m), D = A 0 m p=1 A py k p Sk, K = ˆµI X k S k. Major time consumption (seconds) on a single cpu implementation. part control11 theta6 maxg51 Elements of B 463.2 78.3 1.5 Cholesky fact. of B 31.7 209.8 3.0 dx 1.8 1.8 47.3 Other dense mat. comp. 1.0 4.1 86.5 Others 7.2 5.13 1.8 Total 505.2 292.3 140.2 30

P min A 0 X s.t. A p X = b p (p = 1,..., m), X S n +. D max m p=1 b py p s.t. m p=1 A py p + S = A 0, S S n +. Computation of a search direction (dx, dy, ds) Bdy = r (the Schur complement eq.), ds = D m p=1 A pdy p, dx = X k ( (X k ) 1 K ds ) (S k ) 1, dx = ( dx + dx T )/2, where B pq = X k A p (S k ) 1 A q (p = 1,..., m, q = 1,..., m), r p = b p A p X k A p ( (X k ) 1 K D ) (S k ) 1 (p = 1,..., m), D = A 0 m p=1 A py k p Sk, K = ˆµI X k S k. In computation of B pq = XA p S 1 A q (1 p q m), exploiting sparsity of A p s is important as shown in Fujisawa-Kojima-Nakata [11]. X and S 1 are dense, but S = A 0 m p=1 A py p inherits the sparsity from A p s and we can take X 1 to be sparse. Later. 31

P min A 0 X s.t. A p X = b p (p = 1,..., m), X S n +. D max m p=1 b py p s.t. m p=1 A py p + S = A 0, S S n +. Structured sparsity The aggregate sparsity pattern Â : a symbolic n n matrix: { if the (i, j)th element of Ap is nonzero for p = 0,..., m, Â ij = 0 otherwise, where denotes a nonzero number. Example: m = 1 A 0 = 1 2 0 0 2 1 0 0 0 0 4 0 0 0 0 0, A 1 = 0 0 0 1 0 3 1 0 0 1 0 0 1 0 0 2 Â = 0 0 0 0 0 0. Next three types of structured sparsity 32

P min A 0 X s.t. A p X = b p (p = 1,..., m), X S n +. D max m p=1 b py p s.t. m p=1 A py p + S = A 0, S S n +. Structured sparsity The aggregate sparsity pattern Â : a symbolic n n matrix: { if the (i, j)th element of Ap is nonzero for p = 0,..., m, Â ij = 0 otherwise, where denotes a nonzero number. Structured sparsity-1 : Â is block-diagonal. Then X, S have the same diagonal block structure as Â. B 1 O O Â = O B 2 O, B i : symmetric. O O B 3 Example: CH 3 N : an SDP from quantum chemistry, Fukuda et al. [13], Zhao et al. [51]. m = 20, 709, n = 12, 802, the number of blocks in Â = 22, the largest bl.size = 3, 211 3, 211, the average bl.size = 583 583. 33

P min A 0 X s.t. A p X = b p (p = 1,..., m), X S n +. D max m p=1 b py p s.t. m p=1 A py p + S = A 0, S S n +. Structured sparsity The aggregate sparsity pattern Â : a symbolic n n matrix: { if the (i, j)th element of Ap is nonzero for p = 0,..., m, Â ij = 0 otherwise, where denotes a nonzero number. Structured sparsity-2 : Â has a sparse Cholesky factorization. a small bandwidth a small bandwidth + bordered O O O O O O O Â =........... O O, Â = O........... O O, : bl.matrix O O O S : the same sparsity pattern as Â. X : fully dense. X 1 : the same sparsity pattern as Â Use X 1 instead X (the positive definite matrix completion used in SDPARA-C) 34

P min A 0 X s.t. A p X = b p (p = 1,..., m), X S n +. D max m p=1 b py p s.t. m p=1 A py p + S = A 0, S S n +. Structured sparsity The aggregate sparsity pattern Â : a symbolic n n matrix: { if the (i, j)th element of Ap is nonzero for p = 0,..., m, Â ij = 0 otherwise, where denotes a nonzero number. Structured sparsity-3 : block-diagonal Â + blockwise orthogonality, for most pairs (p, q) 1 p < q m, A p and A q do not share nonzero blocks; hence A p A q = 0. the Schur complement matrix B used in PDIPM becomes sparse. A 11 O O O O O O O O A 1 = O O O, A 2 = O A 22 O, A 3 = O O O. O O O O O O O O A 33 An engineering application, Ben-Tal et al. [52]. A sparse SDP relaxation of poly. opt. problem, Waki et al. [44]. Incorporated in SDPT3 and SeDuMi but not in SDPA. 35

Optimization Technology Center http://www.ece.northwestern.edu/otc/ NEOS Server for Optimization http://www-neos.mcs.anl.gov/ NEOS Solvers http://www-neos.mcs.anl.gov/neos/solvers/index.html Semidefinite Programming software lang. method csdp c p-d ipm DSDP c dual scaling ipm pensdp matalb augmented Lagrangian sdpa c++ p-d ipm sdpa-c c++ p-d ipm, pd matrix completion sdplr c nlp algorithm sdpt3 matlab p-d ipm sedumi matlab p-d ipm, self-dual embedding Online solver submit your SDP problem through Internet. Binary and/or source codes are available. SDPA sparse format for all packages, matlab interface. 37

Some remarks on software packages. SDPs are more difficult to solve than LPs. Degeneracy, no interior points in primal or dual SDPs. Large scale problems. SDPs having free variables. More accuracy requires more cpu time. Some software package can solve SDPs faster with low accuracy. Sparse structure of SDPs. Some SDPs can be solved faster and/or more accurately by one package, but other SDPs by some other else. Try some software packages that fit your problem. SDPA Online Solver pre-open http://sdpara.r.dendai.ac.jp/portal/ SDPA on a single cpu. SDPARA on Opteron 32 cpu (1.4GHz), Xeon 16cpu (2.8GHz). SDPARA-C on Opteron 32 cpu (1.4GHz), Xeon 16cpu (2.8GHz). Submit your problem and choose one of the packages. 38

Part II: Primal-dual interior-point methods 1. Existing numerical methods for SDPs 2. Three different approaches to primal-dual interior-point methods for SDPs 3. The central trajectory 4. Search directions 5. Various primal-dual interior-point methods 6. Exploiting sparsity 7. Software packages 8. SDPA sparse format 9. Numerical results 39

SDPA sparse format. SDP (dual form): min 10y 1 + 20y 2 s.t. A 1 y 1 + A 2 y 2 A 0 O, where 1 0 0 0 5 0 0 0 0 0 0 0 A 0 = 0 2 0 0 0 0 3 0, A 1 = 0 6 0 0 0 0 0 0, A 2 = 0 7 0 0 0 0 8 1. 0 0 0 4 0 0 0 0 0 0 1 9 Example 2 =mdim 2 =nblocks 2 2 10.0 20.0 0 1 1 1 1.0 0 1 2 2 2.0 0 2 1 1 3.0 0 2 2 2 4.0 1 1 1 1 5.0 1 1 2 2 6.0 2 1 2 2 7.0 2 2 1 1 8.0 2 2 1 2-1.0 2 2 2 2 9.0 comment block structure objective coefficients nonzero coefficients of A p (p = 0, 1, 2); p block row column value, where row column 40

A sparse SDP relaxation [44] of minimizing g.rosenbrock function: n ( f(x) = 100(xi x 2 i 1 )2 + (1 x i ) 2), x 1 0. i=2 SeDuMi. a single cpu, 2.4GHz Xeon. cpu in sec. n ɛ obj Sparse Dense 10 2.5e-08 0.2 10.6 15 6.5e-08 0.2 756.6 400 2.5e-06 3.7 800 5.5e-06 6.8 ɛ obj = the lower bound for opt. value the approx. opt. value. max{1, the lower bound for opt. value } When n = 800, SDP relaxation problem: A p : 4794 4794 (p = 1, 2,..., 7, 988) B : 7, 988 7, 988. Each A p consists of 799 diagonal blocks with 6 6 matrices. A p A q = 0 for most pairs (p, q) a sparse Chol. fact. of B. 42

P : min A 0 X sub.to A p X = b p (1 p m), S n X O D : max m p=1 b py p sub.to m p=1 A py p + S = A 0, S n S O SDPs from quantum chemistry, Fukuda et al. [13], Zhao et al. [51]. atoms/molecules m n #blocks the sizes of largest blocks O 7230 5990 22 [1450, 1450, 450,...] HF 15018 10146 22 [2520, 2520, 792,...] CH 3 N 20709 12802 22 [3211, 3211, 1014,...] number of processors 16 64 128 256 O elements of B 10100.3 2720.4 1205.9 694.2 Chol.fact. of B 218.2 87.3 68.2 106.2 total 14250.6 4453.3 3281.1 2951.6 HF elements of B * * 13076.1 6833.0 Chol.fact. of B * * 520.2 671.0 total * * 26797.1 20780.7 CH 3 N elements of B * * 34188.9 18003.3 Chol.fact. of B * * 1008.9 1309.9 total * * 57034.8 45488.9 43

P : min A 0 X sub.to A p X = b p (1 p m), S n X O D : max m p=1 b py p sub.to m p=1 A py p + S = A 0, S n S O Large-size SDPs by SDPARA-C [31] (64 CPUs) 3 types of test Problems: (a) SDP relaxations of randomly generated max. cut problems on lattice graphs with size 10 1000, 10 2000 and 10 4000. (b) SDP relaxations of randomly generated max. clique problems on lattice graphs with size 10 500, 10 1000 and 10 2000. (c) Randomly generated norm minimization problems min. F 10 0 F i y i sub.to y i R (i = 1, 2,..., 10) i=1 where F i : 10 9990, 10 19990 or 10 39990 and G = the square root of the max. eigenvalue of G T G. In all cases, the aggregate sparsity pattern consists of one block and is very sparse. 44

P : min A 0 X sub.to A p X = b p (1 p m), S n X O D : max m p=1 b py p sub.to m p=1 A py p + S = A 0, S n S O Large-size SDPs by SDPARA-C (64 CPUs) time memory Problem n m (s) (MB) Cut(10 1000) 10000 10000 274.3 126 (a) Cut(10 2000) 20000 20000 1328.2 276 Cut(10 4000) 40000 40000 7462.0 720 Clique(10 500) 5000 9491 639.5 119 (b) Clique(10 1000) 10000 18991 3033.2 259 Clique(10 2000) 20000 37991 15329.0 669 Norm(10 9990) 10000 11 409.5 164 (c) Norm(10 19990) 20000 11 1800.9 304 Norm(10 39990) 40000 11 7706.0 583 45

Part III: Some applications 1. Matrix approximation problems 2. A nonconvex quadratic optimization problem 3. The max-cut problem 4. Sum of squares of polynomials 1

Part III: Some applications 1. Matrix approximation problems 2. A nonconvex quadratic optimization problem 3. The max-cut problem 4. Sum of squares of polynomials 2

Let F p be an k l matrix (p = 0, 1,..., m). We want to approximate the matrix F 0 as a linear combination of F p (p = 1,..., m) or minimize { F (x) : x R m }, where F (x) = F 0 m p=1 F px p for x = (x 1,..., x m ) T. Which norm? A = max { A ij : i = 1,..., k, j = 1,..., l} (the norm) 1/2 k l A F = (the Frobenius norm) i=1 j=1 A 2 ij A L2 = max u 2 =1 Au = ( the maximum eigenvalue of A T A ) 1/2 (the L 2 operator norm). 3

Let F p be an k l matrix (p = 0, 1,..., m). We want to approximate the matrix F 0 as a linear combination of F p (p = 1,..., m) or minimize { F (x) : x R m }, where F (x) = F 0 m p=1 F px p for x = (x 1,..., x m ) T. Which norm? A = max { A ij : i = 1,..., k, j = 1,..., l} (the norm) minimize { F (x) : x R m } min ζ sub.to ζ F (x) ij ζ (i = 1,..., k, j = 1,..., l) LP (Linear Programming) 4

Let F p be an k l matrix (p = 0, 1,..., m). We want to approximate the matrix F 0 as a linear combination of F p (p = 1,..., m) or minimize { F (x) : x R m }, where F (x) = F 0 m p=1 F px p for x = (x 1,..., x m ) T. Which norm? A F = k i=1 l j=1 A 2 ij 1/2 (the Frobenius norm) minimize { F (x) F : x R m } k l minimize x R m F (x) 2 F F (x) 2 ij i=1 j=1 the least square problem, convex QP (quadratic Programming) 5

Let F p be an k l matrix (p = 0, 1,..., m). We want to approximate the matrix F 0 as a linear combination of F p (p = 1,..., m) or minimize { F (x) : x R m }, where F (x) = F 0 m p=1 F px p for x = (x 1,..., x m ) T. A L2 = max u 2 =1 Au = ( the maximum eigenvalue of A T A ) 1/2 (the L 2 operator norm) minimize { F (x) L2 : x R m} minimize the maximum eigenvalue of F (x) T F (x) minimize λ subject to λi F (x) T F (x) O minimize λ subject to the Schur complement ( ) I F (x) F (x) T O (SDP) λi 6

Part III: Some applications 1. Matrix approximation problems 2. A nonconvex quadratic optimization problem 3. The max-cut problem 4. Sum of squares of polynomials 7

An SDP relaxation of QOP QOP: min. x T Q 0 x + 2q T 0 x s.t. xt Q p x + 2q T p x + γ p 0 (p = 1,..., m). ( ) Q γp q T p = p, γ q p Q 0 = 0. p ( ) ( ) 1 x T 1 x T QOP: min. Q0 x xx T s.t. Qp x xx T 0 (p = 1,..., m). relaxation SDP-P: min. Let X ij ( = x i x ) j (i, j ( = 1,... ), n) or X = xx T ; 1 x T 1 x T hence = x X x xx T O (lifting) ( ) ( ) 1 x T 1 x T Q0 s.t. Qp 0 (p = 1,..., m), x X x X ( ) 1 x T O. x X x, X are variable in SDP-P. If x is a feas.sol. of QOP then (x, X) = (x, xx T ) is a feas.sol. of SDP-P with the same obj.val. relaxation; opt.val QOP opt.val SDP-P. If Q p (p = 0, 1,..., m) are positive semidefinite, then QOP = SDP-P. 8

The Lagrangian dual of QOP QOP: min. x T Q 0 x + 2q T 0 x s.t. xt Q p x + 2q T p x + γ p 0 (p = 1,..., m). Lagrangian function L(x, λ) = x T Q p x + 2q T p x + ( ) m i=1 λ i x T Q p x + 2q T p x + γ p, λ = (λ 1,..., λ m ) R m +. Lagrangian dual: max min{l(x, λ) : x R n } s.t. λ R m + L. dual: max. ζ s.t. L(x, λ) ζ 0 ( x R n ), λ R m + or L. dual: max. ζ m m ( ) T λ i γ i ζ q T 0 1 + λ i q T i ( ) s.t. i=1 i=1 1 x m m 0 ( x R n ), q 0 + λ i q i Q 0 + λ i Q x i λ R m + Here x is not a variable. i=1 9 i=0

SDP-D: max. ζ s.t. m m λ i γ i ζ q T 0 + λ i q T i i=1 q 0 + m λ i q i Q 0 + i=1 i=1 i=0 m λ i Q i O, λ Rm +. SDP-D above is the dual SDP of SDP-P, the SDP relaxation of the QOP in the previous example. 10

Part III: Some applications 1. Matrix approximation problems 2. A nonconvex quadratic optimization problem 3. The max-cut problem 4. Sum of squares of polynomials [14] M. X. Goemans and D. P. Williamson, Improved approximation algorithms for maximum cut and satisfiability problems using semidefinite programming, Journal of the ACM, 42 (1995) 1115-1145. 11