Introduction to Semidefinite Programs

Similar documents
Parallel implementation of primal-dual interior-point methods for semidefinite programs

SDPARA : SemiDefinite Programming Algorithm PARAllel version

m i=1 c ix i i=1 F ix i F 0, X O.

III. Applications in convex optimization

Implementation and Evaluation of SDPA 6.0 (SemiDefinite Programming Algorithm 6.0)

The Ongoing Development of CSDP

Research Reports on Mathematical and Computing Sciences

arxiv: v1 [math.oc] 26 Sep 2015

Advances in Convex Optimization: Theory, Algorithms, and Applications

Interior Point Methods: Second-Order Cone Programming and Semidefinite Programming

Degeneracy in Maximal Clique Decomposition for Semidefinite Programs

A PREDICTOR-CORRECTOR PATH-FOLLOWING ALGORITHM FOR SYMMETRIC OPTIMIZATION BASED ON DARVAY'S TECHNIQUE

A CONIC DANTZIG-WOLFE DECOMPOSITION APPROACH FOR LARGE SCALE SEMIDEFINITE PROGRAMMING

E5295/5B5749 Convex optimization with engineering applications. Lecture 5. Convex programming and semidefinite programming

A NEW SECOND-ORDER CONE PROGRAMMING RELAXATION FOR MAX-CUT PROBLEMS

Lecture Note 5: Semidefinite Programming for Stability Analysis

Second Order Cone Programming Relaxation of Nonconvex Quadratic Optimization Problems

A Continuation Approach Using NCP Function for Solving Max-Cut Problem

Sparse Matrix Theory and Semidefinite Optimization

A PARALLEL INTERIOR POINT DECOMPOSITION ALGORITHM FOR BLOCK-ANGULAR SEMIDEFINITE PROGRAMS

PENNON A Generalized Augmented Lagrangian Method for Convex NLP and SDP p.1/39

A CONIC INTERIOR POINT DECOMPOSITION APPROACH FOR LARGE SCALE SEMIDEFINITE PROGRAMMING

Exploiting Sparsity in Primal-Dual Interior-Point Methods for Semidefinite Programming.

Croatian Operational Research Review (CRORR), Vol. 3, 2012

L. Vandenberghe EE236C (Spring 2016) 18. Symmetric cones. definition. spectral decomposition. quadratic representation. log-det barrier 18-1

Second Order Cone Programming Relaxation of Positive Semidefinite Constraint

Lecture: Algorithms for LP, SOCP and SDP

The Q Method for Symmetric Cone Programmin

The moment-lp and moment-sos approaches

Lecture 1. 1 Conic programming. MA 796S: Convex Optimization and Interior Point Methods October 8, Consider the conic program. min.

Infeasible Primal-Dual (Path-Following) Interior-Point Methods for Semidefinite Programming*

A PARALLEL INTERIOR POINT DECOMPOSITION ALGORITHM FOR BLOCK-ANGULAR SEMIDEFINITE PROGRAMS IN POLYNOMIAL OPTIMIZATION

Semidefinite Programming

15. Conic optimization

Semidefinite Programming, Combinatorial Optimization and Real Algebraic Geometry

SPARSE SECOND ORDER CONE PROGRAMMING FORMULATIONS FOR CONVEX OPTIMIZATION PROBLEMS

Second-order cone programming

Infeasible Primal-Dual (Path-Following) Interior-Point Methods for Semidefinite Programming*

POLYNOMIAL OPTIMIZATION WITH SUMS-OF-SQUARES INTERPOLANTS

LMI MODELLING 4. CONVEX LMI MODELLING. Didier HENRION. LAAS-CNRS Toulouse, FR Czech Tech Univ Prague, CZ. Universidad de Valladolid, SP March 2009

Fast Algorithms for SDPs derived from the Kalman-Yakubovich-Popov Lemma

Agenda. Interior Point Methods. 1 Barrier functions. 2 Analytic center. 3 Central path. 4 Barrier method. 5 Primal-dual path following algorithms

A priori bounds on the condition numbers in interior-point methods

Research Note. A New Infeasible Interior-Point Algorithm with Full Nesterov-Todd Step for Semi-Definite Optimization

12. Interior-point methods

SDPA PROJECT : SOLVING LARGE-SCALE SEMIDEFINITE PROGRAMS

Primal-dual IPM with Asymmetric Barrier

Solving large Semidefinite Programs - Part 1 and 2

Semidefinite Programming Basics and Applications

c 2005 Society for Industrial and Applied Mathematics

The Q Method for Second-Order Cone Programming

Interior Point Methods. We ll discuss linear programming first, followed by three nonlinear problems. Algorithms for Linear Programming Problems

A PARALLEL TWO-STAGE INTERIOR POINT DECOMPOSITION ALGORITHM FOR LARGE SCALE STRUCTURED SEMIDEFINITE PROGRAMS

How to generate weakly infeasible semidefinite programs via Lasserre s relaxations for polynomial optimization

LMI solvers. Some notes on standard. Laboratoire d Analyse et d Architecture des Systèmes. Denis Arzelier - Dimitri Peaucelle - Didier Henrion

Lagrangian-Conic Relaxations, Part I: A Unified Framework and Its Applications to Quadratic Optimization Problems

1 Outline Part I: Linear Programming (LP) Interior-Point Approach 1. Simplex Approach Comparison Part II: Semidenite Programming (SDP) Concludin

Lagrange Duality. Daniel P. Palomar. Hong Kong University of Science and Technology (HKUST)

Introduction to Semidefinite Programming I: Basic properties a

w Kluwer Academic Publishers Boston/Dordrecht/London HANDBOOK OF SEMIDEFINITE PROGRAMMING Theory, Algorithms, and Applications

A new primal-dual path-following method for convex quadratic programming

A General Framework for Convex Relaxation of Polynomial Optimization Problems over Cones

Projection methods to solve SDP

Sparse Optimization Lecture: Basic Sparse Optimization Models

A path following interior-point algorithm for semidefinite optimization problem based on new kernel function. djeffal

Identifying Redundant Linear Constraints in Systems of Linear Matrix. Inequality Constraints. Shafiu Jibrin

Department of Social Systems and Management. Discussion Paper Series

Lecture 10. Primal-Dual Interior Point Method for LP

COURSE ON LMI PART I.2 GEOMETRY OF LMI SETS. Didier HENRION henrion

2.1. Jordan algebras. In this subsection, we introduce Jordan algebras as well as some of their basic properties.

Research Reports on Mathematical and Computing Sciences

Lecture 17: Primal-dual interior-point methods part II

Lecture: Introduction to LP, SDP and SOCP

Tamás Terlaky George N. and Soteria Kledaras 87 Endowed Chair Professor. Chair, Department of Industrial and Systems Engineering Lehigh University

CSC Linear Programming and Combinatorial Optimization Lecture 10: Semidefinite Programming

Interior Point Methods in Mathematical Programming

c 2000 Society for Industrial and Applied Mathematics

BBCPOP: A Sparse Doubly Nonnegative Relaxation of Polynomial Optimization Problems with Binary, Box and Complementarity Constraints

Second-Order Cone Programming

Lecture: Examples of LP, SOCP and SDP

Lecture 5. Theorems of Alternatives and Self-Dual Embedding

Lecture 15 Newton Method and Self-Concordance. October 23, 2008

Primal-dual path-following algorithms for circular programming

Research Reports on Mathematical and Computing Sciences

Module 04 Optimization Problems KKT Conditions & Solvers

Real Symmetric Matrices and Semidefinite Programming

The maximal stable set problem : Copositive programming and Semidefinite Relaxations

A semidefinite relaxation scheme for quadratically constrained quadratic problems with an additional linear constraint

A full-newton step infeasible interior-point algorithm for linear programming based on a kernel function

Primal-Dual Geometry of Level Sets and their Explanatory Value of the Practical Performance of Interior-Point Methods for Conic Optimization

ELE539A: Optimization of Communication Systems Lecture 15: Semidefinite Programming, Detection and Estimation Applications

1. Introduction. Consider the following quadratic binary optimization problem,

Interior Point Methods for Mathematical Programming

ORF 523 Lecture 9 Spring 2016, Princeton University Instructor: A.A. Ahmadi Scribe: G. Hall Thursday, March 10, 2016

Research Reports on Mathematical and Computing Sciences

Lecture 6: Conic Optimization September 8

6.854J / J Advanced Algorithms Fall 2008

Tamás Terlaky George N. and Soteria Kledaras 87 Endowed Chair Professor. Chair, Department of Industrial and Systems Engineering Lehigh University

A direct formulation for sparse PCA using semidefinite programming

The Solution of Euclidean Norm Trust Region SQP Subproblems via Second Order Cone Programs, an Overview and Elementary Introduction

Transcription:

Introduction to Semidefinite Programs Masakazu Kojima, Tokyo Institute of Technology Semidefinite Programming and Its Application January, 2006 Institute for Mathematical Sciences National University of Singapore Jan 9-13, 2006

Main purpose Introduction of semidefinite programs Brief reviw of SDPs Contents Part I: Introduction to SDP and its basic theory 70 minutes Part II: Primal-dual interior-point methods 70 minutes Part III: Some applications Appendix: Linear optimization problems over symmetric cones References Not comprehensive but helpful for further study of the subject

Contents Part I: Basic theory. 1. LP versus SDP 2. Why is SDP interesting and important? 3. The equality standard form SDP 4. Some basic properties on positive semidefinite matrices and their inner product 5. General SDPs 6. Some examples 7. Duality Part II: Primal-dual interior-point methods 1. Existing numerical methods for SDPs 2. Three approaches to primal-dual interior-point methods for SDPs 3. The central trajectory 4. Search directions 5. Various primal-dual interior-point methods 6. Exploiting sparsity 7. Software packages 8. SDPA sparse format 9. Numerical results

Part III: Some applications. 1. Matrix approximation problems 2. A nonconvex quadratic optimization problem 3. The max-cut problem 4. Sum of squares of polynomials Appendix: Linear Optimization Problems over Symmetric Cones. 1. Linear optimization problems over cones 2. Symmetric cones 3. Euclidean Jordan algebra 4. The equality standard form SOCP (Second Order Cone Program) 5. Some applications of SOCPs

Part I: Basic theory 1. LP versus SDP 2. Why is SDP interesting and important? 3. The equality standard form 4. Some basic properties on positive semidefinite matrices and their inner product 5. General SDPs 6. Some examples 7. Duality 1

Part I: Basic theory 1. LP versus SDP 2. Why is SDP interesting and important? 3. The equality standard form 4. Some basic properties on positive semidefinite matrices and their inner product 5. General SDPs 6. Some examples 7. Duality 1

SDP is an extension of LP to the space of symmetric matrices. LP: minimize X 11 2X 12 5X 22 subject to 2X 11 + 3X 12 + X 22 = 7, X 11 + X 12 1, X 11 0, X 12 0, X 22 0. SDP: minimize X 11 2X 12 5X 22 subject to 2X 11 + 3X 12 + X 22 = 7, X 11 + X 12 1, X ( 11 0, X ) 12 0, X 22 0, X11 X 12 O (positive semidefinite). X 12 X 22 Both LP and SDP have linear objective functions in real variables X 11, X 12, X 22. Both LP and SDP have linear equality and inequality constraints in in real variables X 11, X 12, X 22. 2

SDP is an extension of LP to the space of symmetric matrices. LP: minimize X 11 2X 12 5X 22 subject to 2X 11 + 3X 12 + X 22 = 7, X 11 + X 12 1, X 11 0, X 12 0, X 22 0. SDP: minimize X 11 2X 12 5X 22 subject to 2X 11 + 3X 12 + X 22 = 7, X 11 + X 12 1, X ( 11 0, X ) 12 0, X 22 0, X11 X 12 O (positive semidefinite). X 12 X 22 In addition to real-valued equality and ( inequality ) constraints, SDP has X11 X a positive semidefinite constraint in 12 or equivalently X 12 X 22 X 11 0, X 22 0, X 11 X 22 X 2 12 0, which requires X 11, X 12, X 22 dependent nonlinearly, while X 11 0, X 12 0, X 22 0 in LP are linear and separable. The feasible region of LP and the feasible region of SDP are convex sets, but the former is polyhedral while the latter is non-polyhedral. 3

Part I: Basic theory 1. LP versus SDP 2. Why is SDP interesting and important? 3. The equality standard form 4. Some basic properties on positive semidefinite matrices and their inner product 5. General SDPs 6. Some examples 7. Duality 4

Lots of Applications to Various Problems Systems and control theory Linear Matrix Inequality [6] SDP relaxations of combinatorial and nonconvex problems Max cut and max clique problems [14] 0-1 integer linear programs [24] Polynomial optimization problems [22, 35] Robust optimization [4] Quantum chemistry [51] Moment problems (applied probability) [5, 23]... Survey articles Todd [39],Vandenberghe-Boyd [45] Handbook of SDP Wolkowicz-Saigal-Vandenberghe [46] Web pages Helmberg[15], Wolkowicz [47] 5

Theory Self-concordant theory [33] Euclidean Jordan algebra [10, 36] Polynomial-time primal-dual interior-point methods [1, 17, 20, 27, 34] SDP serves as a core convex optimization problem 6

Classification of Optimization Problems!! Convex! Continuous Discrete!!! Nonconvex Linear Optimization Problem 0-1 Integer over Symmetric Cone! LP & QOP! SemiDefinite Program! Second Order Cone Program! Convex Quadratic Optimization Problem! Linear Program!! relaxation!! "!! "! Polynomial Optimization Problem! POP over Symmetric Cone! Bilinear Matrix Inequality

Part I: Basic theory 1. LP versus SDP 2. Why is SDP interesting and important? 3. The equality standard form 4. Some basic properties on positive semidefinite matrices and their inner product 5. General SDPs 6. Some examples 7. Duality 8

(LP) minimize a 0 x subject to a p x = b p (p = 1,..., m), R n x 0. Here R : the set (linear space) of real numbers, R n : the linear space of n dim. vectors, a p R n : data, n dim. vector (p = 0, 1, 2..., m), b p R : data, real number (p = 1, 2..., m), x R n : variable, n dim. vector, a p x = n i=1 [a p] i x i (the inner product of a p and x). 9

(LP) minimize a 0 x subject to a p x = b p (p = 1,..., m), R n x 0. (SDP) minimize A 0 X subject to A p X = b p (p = 1,..., m), S n X O. Here S n : the linear space of n n symmetric matrices, A p S n : data, n n symmetric matrix (p = 0, 1, 2..., m), b p R : data, real number (p = 1, 2..., m), X S n : n n variable, symmetric matrix; X 11 X 12... X 1n X = (X ij ) = X 21 X 22... X 2n...... Sn, X n1 X n2... X nn X ij = X ji R (1 i j n), X O X S n is positive semidefinite, A p X = n n i=1 j=1 [A p] ij X ij (the inner product of A p and X). Notation R k l : the linear space of k l matrices. 10

(LP) minimize a 0 x subject to a p x = b p (p = 1,..., m), R n x 0. (SDP) minimize A 0 X subject to A p X = b p (p = 1,..., m), S n X O. m = ( 2, n = 2, b ) 1 = 7, b 2 ( = 9, ) X11 X X = 12 1 1, A X 21 X 0 =, ( 22 ) ( 1 5 ) 2 1.5 2 0.5 A 1 =, A 1.5 1 2 =. 0.5 3 Note that X 12 = X 21. Example minimize X 11 2X 12 5X 22 subject to 2X ( 11 + 3X ) 12 + X 22 = 7, 2X 11 + X 12 + 3X 22 = 9, X11 X 12 O. X 12 X 22 11

Part I: Basic theory 1. LP versus SDP 2. Why is SDP interesting and important? 3. The equality standard form 4. Some basic properties on positive semidefinite matrices and their inner product 5. General SDPs 6. Some examples 7. Duality 12

(SDP) minimize A 0 X subject to A p X = b p (p = 1,..., m), S n X O. S n X O : semidefinite constraint. Definition: X O if u T Xu = n i=1 n j=1 X iju i u j 0 for u R n. Definition: X O if u T Xu = n i=1 n j=1 X iju i u j > 0 for u 0. X S n all n eigenvalues are real. X O ( O) all n eigenvalues are nonnegative (positive). X O ( O) all the principal minors are nonnegative (positive). X O ( O) all diagonal X ii s are nonnegative (positive). X O and X ii = 0 X ij = 0 ( j). 13

(SDP) minimize A 0 X subject to A p X = b p (p = 1,..., m), S n X O. S n X O : semidefinite constraint. X O ( O) n n (nonsingular) B; X = BB T (factorization). X O n n lower triang. L; X = LL T (Cholesky factorization). X O n n orthogonal P and n n diagonal D; X = P DP T (orthogonal decompostion). Here each diagonal element λ i = D ii of D is an eigenvalue of X and each ith column p i of P an eigenvector corresponding to λ i. X O C S n + ; X = C2 Take C = P (D) 1/2 P T ; ( ) ( ) C 2 = P (D) 1/2 P T P (D) 1/2 P T = P DP T = X. ( X ) 2. We will write X = 14

(SDP) minimize A 0 X subject to A p X = b p (p = 1,..., m), S n X O. S n : a linear space with dimension n(n + 1)/2. X + Y S n for X S n and Y S n. αx S n for α R and X S n. linear independence. a basis consisting of n(n + 1)/2. Example. n = 2. Note that X 12 = X 21. ( ) ( ) 1.1 0.5 2.4 0.6 2 + 0.5 = 0.5 2.4 0.6 1.2 ( ) ( ) ( ) 1 0 0 1 0 0,, : a basis of S 2. 0 0 1 0 0 1 ( 3.4 0.7 0.7 5.4 ), For every A, X S n, the inner product A X is defined; A X = ( n n ) i=1 j=1 A ijx ij = ( n n ) i=1 j=1 A ijx ji (i, i)th element of AX u T Xu = trace u T Xu = trace Xuu T = X uu T = trace AX. 15

(SDP) minimize A 0 X subject to A p X = b p (p = 1,..., m), S n X O. S n X O and the inner product X Y. S n + (Sn + ) { Y S n : Y X 0 for X S n +}. (Proof) Let Y S n +. For X O, Y X = trace Y X = trace Y LL T = trace L T Y L 0. (since S n L T Y L O) S n + (Sn + ). Hence S n + = (Sn + ) (self-dual). (Proof) Suppose that Y S n +. Then u Rn ; 0 > u T Y u = Y (uu T ). Hence Y (S n + ). 16

(SDP) minimize A 0 X subject to A p X = b p (p = 1,..., m), S n X O. S n X O and the inner product X Y. X, Y O and X Y = 0 XY = O. (Proof) Apply the eigenvalue decomposition X = P DP T. Then 0 = X Y = trace XY = trace P DP T Y = trace DP T Y P = n i=1 D ( ii P T Y P ) ii, D ii 0, ( P T Y P ) { ii 0 Dii = 0 or i, ( P T Y P ) ii = 0; the ith row of ( P T Y P ) = 0. Therefore DP T Y P = O, which implies XY = P DP T Y P P T = O. 17

(SDP) minimize A 0 X subject to A p X = b p (p = 1,..., m), S n X O. Common properties on R n + {x Rn : x 0} and S n + {X Sn : X O}. R n + is a cone; αx Rn + if α 0 and x Rn +. R n + is convex; λx + (1 λ)x Rn + if 0 λ 1 and x, y Rn +. R n + is self-dual; (Rn + ) { } y R n : y x 0 for x R n + = R n x, y R n + and x y = 0 = x iy i = 0 (i = 1,..., n). S n + is a cone; αx Sn + if α 0 and X Sn +. S n + is convex; λx + (1 λ)y Sn + if 0 λ 1 and X, Y Sn +. S n + is self-dual; (Sn + ) { } Y S n : Y X 0 for X S n + = S n X, Y S n + and X Y = 0 = XY = O. +. +. R n + and Sn + are special cases of symmetric cones discussed in Appendix. 18

Part I: Basic theory 1. LP versus SDP 2. Why is SDP interesting and important? 3. The equality standard form 4. Some basic properties on positive semidefinite matrices and their inner product 5. General SDPs 6. Some examples 7. Duality 19

Equality standard form: (SDP) minimize A 0 X subject to A p X = b p (p = 1,..., m), S n X O. A 1 p O O X 1 O O O A 2 A p p O......, X O X 2 O...... Sn1 + +n t. O O A t O O X t p (p = 0, 1,..., m) Equality standard form with multiple matrix variables: (SDP) minimize t q=1 Aq 0 Xq subject to t q=1 Aq p Xq = b p (p = 1,..., m), S nq X q O (q = 1,..., t). If n q = 1 (q = 1,..., t), (SDP) is equivalent to the equality standard form of LP, where A q p R and Xq R. Can we transform the above (SDP) (or the equality standard form of LP) into Equality standard form (SDP)? 20

Equality standard form: (SDP) minimize A 0 X subject to A p X = b p (p = 1,..., m), S n X O.? An SDP from systems and control theory (SDP) minimize subject to λ ( XA + A T X + C T C XB + C T ) D B T X + D T C D T D I X λi. λi, Here X S n and λ R are variables, and A, B, C, D are given data matrices. Can we transform the (SDP) into Equality standard form (SDP)? Yes in theory, but not practical at all. Transform (SDP) into an LMI standard form (with equality constraints), which corresponds to the dual of an equality standard form with free variables. 21

A general SDP: minimize a linear function in x 1,..., x k and X q (q = 1,..., t) subject to linear equalities in x 1,..., x k and X q (q = 1,..., t), linear (matrix) inequalities in x 1,..., x k and X q (q = 1,..., t), x 1,..., x k R (free real variables), X q O (q = 1,..., t) (positive semidefinite matrix variables). Here a nonnegative x i is regarded as a 1 1 psd matrix variable, and a matrix variable U R k m a set of free variables U ij s. Any real-valued linear function in X S n can be written as n n A X = A ij X ij for A S n. i=1 j=1 Here A ij denote the (i, j)th element of A S n, and X ij the (i, j)th element of X S n. We can transform any SDP into Equality standard form. But such a transformation is neither trivial nor practical in many cases. Usually it is easier to reduce an SDP to an LMI standard form with equality constraints than to Equality standard form. 22

A general SDP: minimize a linear function in x 1,..., x k and X q (q = 1,..., t) subject to linear equalities in x 1,..., x k and X q (q = 1,..., t), linear (matrix) inequalities in x 1,..., x k and X q (q = 1,..., t), x 1,..., x k R (free real variables), X q O (q = 1,..., t) (positive semidefinite matrix variables). Here a nonnegative x i is regarded as a 1 1 psd matrix variable, and a matrix variable U R k m a set of free variables U ij s. Reduction to an LMI standard form with equality constraints. Represent each symmetric variable X q S nq as a linear combination of a basis E q ij (1 i j nq ) such that X q = E q ij yq ij, 1 i j n q where y q ij denotes a free real variable and Eq ij an nq n q matrix with 1 at the (i, j)th and (j, i)th elements and 0 elsewhere. Then substitute it into the general SDP. 23

A general SDP: minimize a linear function in x 1,..., x k and X q (q = 1,..., t) subject to linear equalities in x 1,..., x k and X q (q = 1,..., t), linear (matrix) inequalities in x 1,..., x k and X q (q = 1,..., t), x 1,..., x k R (free real variables), X q O (q = 1,..., t) (positive semidefinite matrix variables). Here a nonnegative x i is regarded as a 1 1 psd matrix variable, and a matrix variable U R k m a set of free variables U ij s. An LMI standard form with equality constraints : minimize a linear function in y 1,..., y l subject to linear equalities in y 1,..., y l, linear (matrix) inequalities in y 1,..., y l, y 1,..., y l R (free real variables). Take the dual we have an equality standard form with free variables. We can apply existing software packages such as CSDP, PENON, SDPA, SDPT3 and SeDuMi to this primal-dual pair. 24

Example minimize w + ( 2 1 1 3 ) X subject to X 2 1 2 1 w O. X 2 1 X 0 0 + O 0 0 w + 2 1 w 0 0 0 0 0 1 ( ) ( ) ( ) 1 0 0 1 0 0 X = y 0 0 1 + y 1 0 2 + y 0 1 3. X 0 1 0 0 0 1 0 0 = 0 0 0 y 1 + 1 0 0 y 2 + 0 0 0 0 0 0 0 0 0 O 2 1 2 1 0 0 0 0 0 1 0 0 0 0. y 3. minimize w + 2y 1 + 2y 2 + 3y 3 1 0 0 0 1 0 subject to 0 0 0 y 1 + 1 0 0 0 0 0 0 0 0 y 2 + 0 0 0 0 1 0 0 0 0 y 3 + O 0 0 0 0 1 w + O 2 1 2 1 0 O. 25

Part I: Basic theory 1. LP versus SDP 2. Why is SDP interesting and important? 3. The equality standard form 4. Some basic properties on positive semidefinite matrices and their inner product 5. General SDPs 6. Some examples 7. Duality 26

Eigenvalues of a symmetric matrix A the max. eigenvalue = min {λ : λi A} = min {λ : λi A O}. the min. eigenvalue = max {λ : A λi O}. We can formulate many engineering problems involving eigenvalues of symmetric matrices via SDPs. A Linear Matrix inequality (LMI) A( ) O, where A( ) is a linear mapping in matrix and/or vector variables can be formulated in For example, A(X) = maximize λ subject to A( ) λi O. ( XA + A T X + C T C XB + C T ) D B T X + D T C D T D I O. For LMIs, see [6] S. Boyd, L. El Ghaui, E. Feron and V. Balakrishnan, Linear Matrix Inequalities in System and Control Theory, SIAM, Philadelphia, 1994. 27

The Schur complement. Let A S k, positive definite, X R k l, Y S l. Then Y X T A 1 X O quadratic in X ( ) A X X T O. Y linear in X Proof: ( ) ( ) ( ) ( ) I A 1 X T A X I A 1 X A O O I X T = Y O I O Y X T A 1 X ( ) I A 1 X is nonsingular. O I ( ) ( ) A X A O X T O Y O Y X T A 1 O. X ( A X X T Y ) A is positive definite. O Y X T A 1 X O. 28

The Schur complement. Let A S k, positive definite, X R k l, Y S l. Then Y X T A 1 X O quadratic in X ( ) A X X T O. Y linear in X When A = I, Y X T X O ( I X X T Y ) O. When A = I, X = x R k and Y = y R, ( ) I x y x T x 0 x T y O. When A = Iy, X = x R k and Y = y R, y x T x 0 y 2 x T x 0 and y 0 (SOCP constraint) (y x T x/y 0 if y > 0) ( Iy x x T y ) O. 29

A quasi-convex optimization problem. minimize (Lx c)t (Lx c) d T x subject to Ax b. Here L R k n, c R k, d R n, A R l n, b R l, and d T x > 0 for feasible x R n. minimize ζ subject to ζ (Lx c)t (Lx c) d T, Ax b. x ζ (Lx ( c)t (Lx c) (d T ) x)i Lx c d T 0 x (Lx c) T O. ζ ( d T ) xi Lx c SDP: minimize ζ subject to (Lx c) T O, Ax b. ζx SOCP 30

Part I: Basic theory 1. LP versus SDP 2. Why is SDP interesting and important? 3. The equality standard form 4. Some basic properties on positive semidefinite matrices and their inner product 5. General SDPs 6. Some examples 7. Duality 31

Equality standard form SDP: min A 0 X sub.to A p X = b p (p = 1,..., m), S n X O. Lagrangian function: L(X, y, S) = A 0 X m (A p X b p ) y p S X p=1 for X S n, y = (y 1,..., y m ) T R m, S n S O. Lagrangian dual: max y R m, S O min X S n L(X, y, S) = max y R m, S O {L(X, y, S) : XL(X, y, S) = O} { = max y R m, S O L(X, y, S) : A 0 } m p=1 A py p S = O { = max y R m, S O L(X, y, S) : S = A 0 } m p=1 A py p { m = max y R m, S O p=1 b py p : S = A 0 } m p=1 A py p (since L(X, y, A 0 m p=1 A py p ) = m p=1 b py p ) { m = max p=1 b py p : S = A 0 } m p=1 A py p, y R m, S O (Dual of Equality standard form). 32

A primal-dual pair of LPs (P) min. a 0 x sub.to a p x = b p (p = 1,..., m), R n x 0. (D) max. m p=1 b py p sub.to m p=1 a py p + s = a 0, R n s 0. Weak duality LP : x s = a 0 x m b p y p 0 for feasible x, y, s. j=1 SDP : X S = A 0 X m b p y p 0 for feasible X, y, S. j=1 (Proof) Let (X, y, S) be a feasible solution of (P) and (D). Then ( 0 X S = S X = A 0 ) m p=1 A py p X = A 0 X m p=1 (A p X)y p = A 0 X m j=1 b py p. A primal-dual pair of SDPs (P) min. A 0 X sub.to A p X = b p (p = 1,..., m), S n X O. (D) max. m p=1 b py p sub.to m p=1 A py p + S = A 0, S n S O. 33

A primal-dual pair of LPs (P) min. a 0 x sub.to a p x = b p (p = 1,..., m), R n x 0. (D) max. m p=1 b py p sub.to m p=1 a py p + s = a 0, R n s 0. Strong duality: If feasible (x, y, s) (x 0, y 0) then m LP : x s = a 0 x b p ȳ p = 0 at optimal ( x, ȳ, s). j=1 If interior feasible (X, y, S) (X O, S O) then m SDP : X S = A 0 X b p ȳ p = 0 at optimal ( X, ȳ, S). j=1 For the strong duality, int. feasible (X, y, S) (X O, S O) is necessary! an example, next A primal-dual pair of SDPs (P) min. A 0 X sub.to A p X = b p (p = 1,..., m), S n X O. (D) max. m p=1 b py p sub.to m p=1 A py p + S = A 0, S n S O. 34

Example [45]: interior feasible (X, y, S) (X O, S O) is necessary! 0 0 0 (P) min 0 0 0 X 0 0 1 1 0 0 0 1 0 sub.to 0 0 0 X = 0, 1 0 0 X = 2, X O. 0 0 0 0 0 2 or (P) min X 33 sub.to X 11 = 0, X 12 + X 21 + 2X 33 = 2, X O. X: a feas.sol. X 11 = X 12 = X 21 = 0 and X 33 = 1; the opt.val. = 1. 1 0 0 0 1 0 0 0 0 (D) max 2y 2 sub.to 0 0 0 y 1 + 1 0 0 y 2 0 0 0. 0 0 0 0 0 2 0 0 1 or (D) min 2y 2 sub.to y 1 y 2 0 y 2 0 0 0 0 1 2y 2 O. (y 1, y 2 ) is feasible y 1 0, y 2 = 0; the opt.value = 0. 35

A primal-dual pair of SDPs (P) min. A 0 X sub.to A p X = b p (p = 1,..., m), S n X O. (D) max. m p=1 b py p sub.to m p=1 A py p + S = A 0, S n S O. The KKT optimality condition A p X = b p (p = 1,..., m), m p=1 A py p + S = A 0, S n X O, S n S O, XS = O (complementarity). O = XS = SX X and S are commutative; hence orthogonal P R n n ; P T XP = diag (λ 1,..., λ n ), P T SP = diag (ν 1,..., ν n ) O = XS = P T XP P T SP = diag (λ 1,..., λ n )diag (ν 1,..., ν n ), P T (X + S) P = diag (λ 1,..., λ n ) + diag (ν 1,..., ν n ). λ i 0, ν i 0, λ i ν i = 0 (i = 1, 2,..., n) (complementarity), X + S O λ i + ν i > 0 (i = 1, 2,..., n) (strict complementarity). LP: x i 0, s i 0, x i s i = 0 ( i) (comp.), x i + s i > 0 ( i) (strict comp.) Hence λ i x i, ν i s i. 36

An equality standard form (P) min. A 0 X sub.to A p X = b p (p = 1,..., m), S n X O. An equality standard form with free variables (P) min. A 0 X + d T 0 z sub.to A p X + d T p z = b p (p = 1,..., m), S n X O, z R l (a free vector variable). Here d p R l (p = 0, 1,..., m). dual An LMI standard form with equality constraints m (D) max. b p y p sub.to p=1 m A p y p + S = A 0, S n S O, p=1 m d p y p = d 0. p=1 An LMI standard form (D) max. m p=1 b py p sub.to m p=1 A py p + S = A 0, S n S O. 37

Part II: Primal-dual interior-point methods 1. Existing numerical methods for SDPs 2. Three approaches to primal-dual interior-point methods for SDPs 3. The central trajectory 4. Search directions 5. Various primal-dual interior-point methods 6. Exploiting sparsity 7. Software packages 8. SDPA sparse format 9. Numerical results 1

Part II: Primal-dual interior-point methods 1. Existing numerical methods for SDPs 2. Three approaches to primal-dual interior-point methods for SDPs 3. The central trajectory 4. Search directions 5. Various primal-dual interior-point methods 6. Exploiting sparsity 7. Software packages 8. SDPA sparse format 9. Numerical results 1

Some existing numerical methods for SDPs IPMs (Interior-point methods) Primal-dual scaling, CSDP(Borchers[7]), SDPA(Fujisawa-K-Nakata- Yamashita[49]), SDPT3(Toh-Todd-Tutuncu[42]), SeDuMi(Sturm[37]) Dual scaling, DSDP(Benson-Ye-Zhang[3] ) Nonlinear programming approaches Spectral bundle method(helmberg-rendl[17]) Gradient-based log-barrier method(burer-monteiro[9]) PENON(Kocvara [19]) Generalized augmented Lagrangian Saddle point mirror-prox algorithm (Lu-Nemirovski-Monteiro[26]) Medium scale SDPs (e.g. n, m 5000) and high accuracy. Large scale SDPs (e.g., n, m 10, 000) and low accuracy. Parallel implementation: SDPA SDPARA(Y-F-K[49]), SDPARA-C(N-Y-F-K[31]) DSDP PDSDP(Benson[2]) CSDP Borchers-Young[8] Spectral bundle method Nayakkankuppam[32] 2

Part II: Primal-dual interior-point methods 1. Existing numerical methods for SDPs 2. Three approaches to primal-dual interior-point methods for SDPs 3. The central trajectory 4. Search directions 5. Various primal-dual interior-point methods 6. Exploiting sparsity 7. Software packages 8. SDPA sparse format 9. Numerical results 3

Three approaches to primal-dual interior-point methods for SDPs (1) Self-concordance. [33] Yu. E. Nesterov and A. Nemirovski, Interior Point Polynomial Algorithms in Convex Programming, SIAM Publication, Philadelphia, 1994. [34] Yu. E. Nesterov and M. J. Todd, Primal-dual interior-point methods for self-scaled cones, SIAM Journal on Optimization, 8 (1998) 324-364. (2) Linear optimization problems over symmetric cones [10] L. Faybusovich, Linear systems in Jordan algebra and primal-dual interior-point algorithms, Journal of Computational and Applied Mathematics, 86 (1997) 149-75. [36] S. Schmieta and F. Alizadeh, Associative and Jordan algebras, and polynomial time interior-point algorithms for symmetric cones, Mathematics of Operations Research, 26 (2001) 543-564. = Appendix. (3) Extensions of primal-dual interior-point algorithms for LPs to SDPs [1, 17, 20, 27] = Our approach in this lecture. 4

Part II: Primal-dual interior-point methods 1. Existing numerical methods for SDPs 2. Three approaches to primal-dual interior-point methods for SDPs 3. The central trajectory 4. Search directions 5. Various primal-dual interior-point methods 6. Exploiting sparsity 7. Software packages 8. SDPA sparse format 9. Numerical results 5

SDP: P min A 0 X s.t. A p X = b p (p = 1,..., m), X S n +. D max m p=1 b py p s.t. m p=1 A py p + S = A 0, S S n +. Trace the central trajectory an opt. sol. in the primal-dual space. (x(µ), y(µ), s(µ)) (X(µ), y(µ),s(µ)) 0 (x*, y*, s*) µ 0 µ (X*, y*,s*) How do we define the central trajectory? How do we numerically trace the central trajectory? LP: P min a 0 x s.t. a p x = b p (p = 1,..., m), x R n +. D max m p=1 b py p s.t. m p=1 a py p + s = a 0, s R n +. 6

SDP: P min A 0 X s.t. A p X = b p (p = 1,..., m), X S n +. D max m p=1 b py p s.t. m p=1 A py p + S = A 0, S S n +. X the interior of S n + {X Sn : X O} det X > 0. X the boundary of S n + det X = 0. A logarithmic barrier to be away from the boundary log det X. A primal-dual pair of SDPs with logarithmic barrier terms, µ > 0. P(µ) min A 0 X µ log det X s.t. A p X = b p (p = 1,..., m), X O. D(µ) max m p=1 b py p + µ log det S s.t. m p=1 A py p + S = A 0, S O. A primal-dual pair of LPs with logarithmic barrier terms, µ > 0. P(µ) min a 0 x µ m i=1 log x i s.t. a p x = b p (p = 1,..., m), x > 0. D(µ) max m p=1 b py p + µ m i=1 log s i s.t. m p=1 a py p + s = a 0, s > 0. A logarithmic barrier to be away from the boundary m i=1 log x i. x the boundary of R n + x i = 0 (i = 1,..., n). x the interior of R n + {x Rn : x 0} x i > 0 (i = 1,..., n). LP: P min a 0 x s.t. a p x = b p (p = 1,..., m), x R n +. D max m p=1 b py p s.t. m p=1 a py p + s = a 0, s R n +. 7

A primal-dual pair of LPs with logarithmic barrier terms, µ > 0. P(µ) min a 0 x µ m i=1 log x i s.t. a p x = b p (p = 1,..., m), x > 0. D(µ) max m p=1 b py p + µ m i=1 log s i s.t. m p=1 a py p + s = a 0, s > 0. For every µ > 0, (P(µ),D(µ)) has a unique opt.sol. (x(µ), y(µ), s(µ)), which converges an optimal solution of (P,D). (x(µ), y(µ), s(µ)) 0 µ (x*, y*,s*) C = {(x(µ), y(µ), s(µ)) : µ > 0} : the central trajectory. (x(µ), y(µ), s(µ)) C { ap x = b p ( p), x > 0, m p=1 a py p + s = a 0, s > 0, x i s i = µ ( i). 8

A primal-dual pair of SDPs with logarithmic barrier terms, µ > 0. P(µ) min A 0 X µ log det X s.t. A p X = b p (p = 1,..., m), X O. D(µ) max m p=1 b py p + µ log det S s.t. m p=1 A py p + S = A 0, S O. For every µ > 0, (P(µ),D(µ)) has a unique opt.sol. (X(µ), y(µ), S(µ)), which converges an optimal solution of (P,D). (X(µ), y(µ),s(µ)) 0 µ (X*, y*,s*) C = {(X(µ), y(µ), S(µ)) : µ > 0} : the central trajectory. (X(µ), y(µ), S(µ)) C { Ap X = b p ( p), X 0, m p=1 A py p + S = A 0, S 0, XS = µi. 9

Some properties of the central trajectory C Suppose that (X, y, S) C or { Ap X = b p ( p), m p=1 A py p + S = A 0, X 0, S 0, XS = µi for µ > 0. X and S are commutative; hence orthogonal P R n n ; P T XP = diag (λ 1,..., λ n ), P T SP = diag (ν 1,..., ν n ). µi = P T µip = P T XP P T SP = diag (λ 1,..., λ n )diag (ν 1,..., ν n ); λ i > 0, ν i > 0, λ i ν i = µ (i = 1,..., n). As µ 0, (X, y, S) an optimal sol. and λ i ν i 0 (i = 1,..., n). In the LP case: If (x, y, s) is on the central trajectory then x i > 0, s i > 0, x i s i = µ (i = 1,..., n). As µ 0, (x, y, s) an optimal sol. and x i s i 0 (i = 1,..., n). Hence λ i x i and ν i s i. 10

Some properties of the central trajectory C Suppose that (X, y, S) C or { Ap X = b p ( p), m p=1 A py p + S = A 0, X 0, S 0, XS = µi for µ > 0. Factorize X such that X = X X. Then µi = X 1 (µi) X = X 1 (XS) X = XS X ( all eigenvalues of XS X S n + are µ). This relation will be used to define some neighborhoods of C. { N 2 (β) = (X, y, S) : feasible, } F XS X µi βµ, where µ = X S/n. { ( XS N (β) = (X, y, S) : feasible, λ ) min X (1 β)µ, where µ = X S/n. }. 11

Part II: Primal-dual interior-point methods 1. Existing numerical methods for SDPs 2. Three approaches to primal-dual interior-point methods for SDPs 3. The central trajectory 4. Search directions 5. Various primal-dual interior-point methods 6. Exploiting sparsity 7. Software packages 8. SDPA sparse format 9. Numerical results 12

SDP case (X(µ), y(µ), S(µ)) C { Ap X = b p ( p), X 0, m p=1 A py p + S = A 0, S 0, XS = µi. (a continuation toward an opt.sol. as µ 0) One iteration: Suppose that (X k, y k, S k ) is the current iterate satisfying (A p X k = b p ( p), m p=1 A pyp k + Sk = A 0, ) X k 0, S k 0. (in theory but not in practice) Choose β [0, 1] and let ˆµ = βx k S k /n, which determines the target point (X(ˆµ), y(ˆµ), S(ˆµ)) on C satisfying ( ) XS = ˆµI, A p X = b p ( p), m p=1 A py p + S = A 0. Compute a search ( Newton ) direction (dx, dy, ds). Choose step lengths α p > 0 and α d > 0 (α p = α d in theory) such that X k+1 = X k + α p dx 0, y k+1 = y k + α d dy, S k+1 = S k + α d ds 0. (X *, y *,S * ) (X k+1, y k+1,s k+1 ) 0 µ (X(µ), y(µ),s(µ)) (dx,dy,ds) ˆµ (X k, y k,s k ) 13

SDP case (X(µ), y(µ), S(µ)) C { Ap X = b p ( p), X 0, m p=1 A py p + S = A 0, S 0, XS = µi. (a continuation toward an opt.sol. as µ 0) One iteration: Suppose that (X k, y k, S k ) is the current iterate satisfying (A p X k = b p ( p), m p=1 A pyp k + Sk = A 0, ) X k 0, S k 0. (in theory but not in practice) Choose β [0, 1] and let ˆµ = βx k S k /n, which determines the target point (X(ˆµ), y(ˆµ), S(ˆµ)) on C satisfying ( ) XS = ˆµI, A p X = b p ( p), m p=1 A py p + S = A 0. Compute a search ( Newton ) direction (dx, dy, ds). Substitue (X, y, S) = (X k, y k, S k ) + (dx, dy, ds) into ( ). Solve (X k + dx)(s k + ds) = ˆµI X k S k + dxs k + X k ds = ˆµI, linearize A p (X k + dx) = b p ( p), m p=1 A p(y k p + dy p) + (S k + ds) = A 0 in (dx, dy, ds). No solution (dx, dy, ds); dx S n, ds S n We need some modification. 14

Some difficulty in the linearized system in (dx, dy, ds): (X k + dx)(s k + ds) = ˆµI X k S k + dxs k + X k ds = ˆµI, linearize A p (X k + dx) = b p ( p), m p=1 A p(y k p + dy p) + (S k + ds) = A 0 No solution (dx, dy, ds); dx S n, ds S n. the total # of equations > the total # of variables. In fact: X k S k + dxs k + X k ds = ˆµI n 2 eqs. dx S n n(n + 1)/2 vars. A p (X k + dx) = b p ( p) m eqs. dy m vars. m p=1 A p(yp k + dy p) + (S k + ds) = A 0 n(n + 1)/2 eqs. ds n(n + 1)/2 vars. Total: 3n 2 /2 + n/2 + m eqs. > n 2 + n + m vars. More than 20 search directions and several family of search directions (Todd [38]) 15

Some difficulty in the linearized system in (dx, dy, ds): (X k + dx)(s k + ds) = ˆµI X k S k + dxs k + X k ds = ˆµI, linearize A p (X k + dx) = b p ( p), m p=1 A p(y k p + dy p) + (S k + ds) = A 0 No solution (dx, dy, ds); dx S n, ds S n. the total # of equations > the total # of variables. In fact: X k S k + dxs k + X k ds = ˆµI n 2 eqs. dx S n n(n + 1)/2 vars. A p (X k + dx) = b p ( p) m eqs. dy m vars. m p=1 A p(yp k + dy p) + (S k + ds) = A 0 n(n + 1)/2 eqs. ds n(n + 1)/2 vars. Total: 3n 2 /2 + n/2 + m eqs. > n 2 + n + m vars. Two ways to resolve this difficulty. (i) Solve the linearized system in (dx, dy, ds) with dx R n n, ds S n, then symmetrize dx R n n such that (dx + dx T )/2 HKM direction [17, 20, 27], practically efficient. (ii) Scale and symmetrize X k S k + dxs k + X k ds = ˆµI to reduce the n 2 eqs. to n(n + 1)/2 eqs theoretically rich, including various directions such as HKM, NT, AHO directions. MZ family [28] 16

Some difficulty in the linearized system in (dx, dy, ds): (X k + dx)(s k + ds) = ˆµI X k S k + dxs k + X k ds = ˆµI, linearize A p (X k + dx) = b p ( p), m p=1 A p(y k p + dy p) + (S k + ds) = A 0 (i) Solve the linearized system in (dx, dy, ds) with dx R n n, ds S n, then symmetrize dx R n n such that (dx + dx T )/2 HKM direction [17, 20, 27], practically efficient. Bdy = r (the Schur complement eq.), ds = D m p=1 A pdy p, dx = X k ( (X k ) 1 K ds ) (S k ) 1, dx = ( dx + dx T )/2, where B pq = X k A p (S k ) 1 A q (p = 1,..., m, q = 1,..., m), r p = b p A p X k A p ( (X k ) 1 K D ) (S k ) 1 (p = 1,..., m), D = A 0 m p=1 A py k p Sk, K = ˆµI X k S k. B is symmetric and positive definite; fully dense in general. Efficient computation of B pq with the use of sparsity of A p s is a key. To solve Bdy = r, use the Cholesky factorization (or the CG method). 17

Some difficulty in the linearized system in (dx, dy, ds): (X k + dx)(s k + ds) = ˆµI X k S k + dxs k + X k ds = ˆµI, linearize A p (X k + dx) = b p ( p), m p=1 A p(y k p + dy p) + (S k + ds) = A 0 (ii) Scale and symmetrize X k S k + dxs k + X k ds = ˆµI to reduce the n 2 eqs. to n(n + 1)/2 eqs theoretically rich, including various directions such as HKM, NT, AHO directions. MZ family [28] For every P R n n (a nonsingular scaling matrix) and B R n n, let Then replace H P (B) = P BP 1 + (P BP 1 ) T. 2 X k S k + dxs k + X k ds = ˆµI by H P ( X k S k + dxs k + X k ds) = ˆµI in the linearized system. P = I AHO. P = X 1/2 HKM, poly.complexity. ( ( P = X 1/2 X 1/2 SX 1/2) 1/2 1/2 X 1/2) NT, poly.complexity. 18

Part II: Primal-dual interior-point methods 1. Existing numerical methods for SDPs 2. Three approaches to primal-dual interior-point methods for SDPs 3. The central trajectory 4. Search directions 5. Various primal-dual interior-point methods 6. Exploiting sparsity 7. Software packages 8. SDPA sparse format 9. Numerical results 19

Typical primal-dual interior-point methods (in theory) (a) Path-following (b) Mizuno-Todd-Ye Predictor-corrector (c) Potential reduction not stated, see [20, 43] (d) Homogeneous self-dual embedding not stated, see [26, 18] Main issues to be discussed: Polynomial-time convergence for (a), (b), (c) and (d) Local convergence for (b) Some additional issues to be taken account: (e) Search directions NT [34], HKM [17, 20, 27] (f) Feasible starting points or infeasible starting points Primal-dual interior-point methods in practice later 20

(a) Path-following primal-dual interior-point methods {(X k, y k, S k )} in a neighborhood N of the central trajectory. A step length ᾱ = max{α : (X k, y k, S k ) + α(dx, dy, ds) N}; (X k+1, y k+1, S k+1 ) = (X k, y k, S k ) + ᾱ(dx, dy, ds). Neighborhood? A target ˆµ to choose (dx, dy, ds) towards (X(ˆµ), y(ˆµ), S(ˆµ))? 21

(a) Path-following primal-dual interior-point methods {(X k, y k, S k )} in a neighborhood N of the central trajectory. A step length ᾱ = max{α : (X k, y k, S k ) + α(dx, dy, ds) N}; (X k+1, y k+1, S k+1 ) = (X k, y k, S k ) + α(dx, dy, ds). Neighborhood? A target ˆµ to choose (dx, dy, ds) towards (X(ˆµ), y(ˆµ), S(ˆµ))? Short-step algorithms [20, 27, 34] { N 2 (τ ) = (X, y, S) : feasible, XS X µi F τ µ, where µ = X S/n. ˆµ = (1 γ/ n)x k S k /n. Here τ, γ (0, 1), e.g., τ = γ = 0.5. }, Short-step : O ( ( )) n X 0 S 0 log ɛ iterations to attain X k S k ɛ 22

(a) Path-following primal-dual interior-point methods {(X k, y k, S k )} in a neighborhood N of the central trajectory. A step length ᾱ = max{α : (X k, y k, S k ) + α(dx, dy, ds) N}; (X k+1, y k+1, S k+1 ) = (X k, y k, S k ) + α(dx, dy, ds). Neighborhood? A target ˆµ to choose (dx, dy, ds) towards (X(ˆµ), y(ˆµ), S(ˆµ))? Long-step (Large-step) algorithms [28] { ( XS N (τ ) = (X, y, S) : feasible, λ ) min X where µ = X S/n. ˆµ = σx k S k /n. Here τ (0, 1), σ (0, 1), e.g., τ = σ = 0.5. (1 τ )µ, ( ( )) ( ( )) X 0 S 0 X 0 O n log (NT) or O n 1.5 S 0 log (HKM) iterations ɛ ɛ }, 23

Comparison Short-step algorithms { N 2 (τ ) = (X, y, S) : feasible, XS X µi F τ µ, where µ = X S/n. ˆµ = (1 γ/ n)x k S k /n. Here τ, γ (0, 1), e.g., τ = γ = 0.5. Note that N 2 (0) = the central trajectory, N 2 (1) the feasible region. Long-step (Large-step) algorithms { ( XS N (τ ) = (X, y, S) : feasible, λ ) min X where µ = X S/n. ˆµ = σx k S k /n. Here τ (0, 1), σ (0, 1), e.g., τ = σ = 0.5. Note that } (1 τ )µ, N (0) = the central trajectory, N (1) = the feasible region. ˆµ = (1 γ/ n)x k S k /n > > ˆµ = σx k S k /n., }, 24

Comparison of the two neighborhoods in detail Short-step (X, y, S) N 2 (τ ) = { (X, y, S) : feasible, XS X µi F τ µ, where µ = X S/n } λ i (i = 1,..., n) : the eigenvalues of XS X n (λ i µ) 2 τ µ τ µ (λ i µ) τ µ (i = 1,..., n) i=1 Long-step (X, y, S) N (τ ) = { ( XS X ) (X, y, S) : feasible, λ min where µ = X S/n. λ i (i = 1,..., n) : the eigenvalues of XS X; λ min ( XS X ) = min i λ i (1 τ )µ λ i or τ µ (λ i µ) (i = 1,..., n) (1 τ )µ, } 25

(b) Mizuno-Todd-Ye Predictor-corrector An outer neighborhood for a predictor step with ˆµ = 0 An inner neighborhood for a corrector step with ˆµ = X S/n (i) Not only polynomial-time but also superlinear linear convergence. (ii) Some additional assumption on the strict complementarity and nondegeneracy of the unique optimal solution (X, y, S ): X + S O. (iii) Inner and outer neighborhoods which enforce (X k, y k, S k ) to converge (X, y, S ) tangentially to the central trajectory [21]. In LP case [50]: (ii) & (iii) are unnecessary. (i) Quadratic convergence. 26

A primal-dual pair of SDPs (P) min A 0 X s.t. A p X = b p (p = 1,..., m), X S n +. (D) max m p=1 b py p s.t. m p=1 A py p + S = A 0, S S n +. Primal-dual interior-point methods in practice (i) An infeasible starting point (X 0, y 0, S 0 ) such that X 0, S 0 O. (ii) (X k, y k, S k ); X k, S k O, Ap X k b p 0, A p y k p + Sk A 0 0. (iii) Step size control: Let γ (0, 1), e.g., γ = 0.80 0.98, α p = γ max{α [0, 1] : X k + αdx O}, α d = γ max{α [0, 1] : S k + αds O}, X k+1 = X k + α p + dx, (y k+1, S k+1 ) = (y k, S k ) + α d (dy, ds) (Conservative compared to the LP case). (iv) Mehrotra s predictor-corrector A reasonable choice of the target point (X(βµ k ), y(βµ k ), S(βµ k )) on the central trajectory, where µ k = X k S k /n. A 2nd-order correction in computing the search direction (dx, dy, ds). 27

Part II: Primal-dual interior-point methods 1. Existing numerical methods for SDPs 2. Three approaches to primal-dual interior-point methods for SDPs 3. The central trajectory 4. Search directions 5. Various primal-dual interior-point methods 6. Exploiting sparsity 7. Software packages 8. SDPA sparse format 9. Numerical results 28

P min A 0 X s.t. A p X = b p (p = 1,..., m), X S n +. D max m p=1 b py p s.t. m p=1 A py p + S = A 0, S S n +. Important features in practice large-scale SDPs arise often. n n matrix variables X, S S n, each of which involves n(n + 1)/2 real variables; for example, n = 2000 n(n + 1)/2 2 million. m linear equality constraints in P or m A p s S n. Data matrices A p (p = 1,..., m) are sparse! 21 benchmark problems with n 500 from SDPLIB [53] the ratio of nonzero elements in A p s 10 2 10 4 10 6 10 8 the number of problems 7 11 3 Exploit sparsity and structured sparsity Enormous computational power parallel computation 29

P min A 0 X s.t. A p X = b p (p = 1,..., m), X S n +. D max m p=1 b py p s.t. m p=1 A py p + S = A 0, S S n +. Computation of a search direction (dx, dy, ds) Bdy = r (the Schur complement eq.), ds = D m p=1 A pdy p, dx = X k ( (X k ) 1 K ds ) (S k ) 1, dx = ( dx + dx T )/2, where B pq = X k A p (S k ) 1 A q (p = 1,..., m, q = 1,..., m), r p = b p A p X k A p ( (X k ) 1 K D ) (S k ) 1 (p = 1,..., m), D = A 0 m p=1 A py k p Sk, K = ˆµI X k S k. Major time consumption (seconds) on a single cpu implementation. part control11 theta6 maxg51 Elements of B 463.2 78.3 1.5 Cholesky fact. of B 31.7 209.8 3.0 dx 1.8 1.8 47.3 Other dense mat. comp. 1.0 4.1 86.5 Others 7.2 5.13 1.8 Total 505.2 292.3 140.2 30

P min A 0 X s.t. A p X = b p (p = 1,..., m), X S n +. D max m p=1 b py p s.t. m p=1 A py p + S = A 0, S S n +. Computation of a search direction (dx, dy, ds) Bdy = r (the Schur complement eq.), ds = D m p=1 A pdy p, dx = X k ( (X k ) 1 K ds ) (S k ) 1, dx = ( dx + dx T )/2, where B pq = X k A p (S k ) 1 A q (p = 1,..., m, q = 1,..., m), r p = b p A p X k A p ( (X k ) 1 K D ) (S k ) 1 (p = 1,..., m), D = A 0 m p=1 A py k p Sk, K = ˆµI X k S k. In computation of B pq = XA p S 1 A q (1 p q m), exploiting sparsity of A p s is important as shown in Fujisawa-Kojima-Nakata [11]. X and S 1 are dense, but S = A 0 m p=1 A py p inherits the sparsity from A p s and we can take X 1 to be sparse. Later. 31

P min A 0 X s.t. A p X = b p (p = 1,..., m), X S n +. D max m p=1 b py p s.t. m p=1 A py p + S = A 0, S S n +. Structured sparsity The aggregate sparsity pattern  : a symbolic n n matrix: { if the (i, j)th element of Ap is nonzero for p = 0,..., m,  ij = 0 otherwise, where denotes a nonzero number. Example: m = 1 A 0 = 1 2 0 0 2 1 0 0 0 0 4 0 0 0 0 0, A 1 = 0 0 0 1 0 3 1 0 0 1 0 0 1 0 0 2  = 0 0 0 0 0 0. Next three types of structured sparsity 32

P min A 0 X s.t. A p X = b p (p = 1,..., m), X S n +. D max m p=1 b py p s.t. m p=1 A py p + S = A 0, S S n +. Structured sparsity The aggregate sparsity pattern  : a symbolic n n matrix: { if the (i, j)th element of Ap is nonzero for p = 0,..., m,  ij = 0 otherwise, where denotes a nonzero number. Structured sparsity-1 :  is block-diagonal. Then X, S have the same diagonal block structure as Â. B 1 O O  = O B 2 O, B i : symmetric. O O B 3 Example: CH 3 N : an SDP from quantum chemistry, Fukuda et al. [13], Zhao et al. [51]. m = 20, 709, n = 12, 802, the number of blocks in  = 22, the largest bl.size = 3, 211 3, 211, the average bl.size = 583 583. 33

P min A 0 X s.t. A p X = b p (p = 1,..., m), X S n +. D max m p=1 b py p s.t. m p=1 A py p + S = A 0, S S n +. Structured sparsity The aggregate sparsity pattern  : a symbolic n n matrix: { if the (i, j)th element of Ap is nonzero for p = 0,..., m,  ij = 0 otherwise, where denotes a nonzero number. Structured sparsity-2 :  has a sparse Cholesky factorization. a small bandwidth a small bandwidth + bordered O O O O O O O  =........... O O,  = O........... O O, : bl.matrix O O O S : the same sparsity pattern as Â. X : fully dense. X 1 : the same sparsity pattern as  Use X 1 instead X (the positive definite matrix completion used in SDPARA-C) 34

P min A 0 X s.t. A p X = b p (p = 1,..., m), X S n +. D max m p=1 b py p s.t. m p=1 A py p + S = A 0, S S n +. Structured sparsity The aggregate sparsity pattern  : a symbolic n n matrix: { if the (i, j)th element of Ap is nonzero for p = 0,..., m,  ij = 0 otherwise, where denotes a nonzero number. Structured sparsity-3 : block-diagonal  + blockwise orthogonality, for most pairs (p, q) 1 p < q m, A p and A q do not share nonzero blocks; hence A p A q = 0. the Schur complement matrix B used in PDIPM becomes sparse. A 11 O O O O O O O O A 1 = O O O, A 2 = O A 22 O, A 3 = O O O. O O O O O O O O A 33 An engineering application, Ben-Tal et al. [52]. A sparse SDP relaxation of poly. opt. problem, Waki et al. [44]. Incorporated in SDPT3 and SeDuMi but not in SDPA. 35

Part II: Primal-dual interior-point methods 1. Existing numerical methods for SDPs 2. Three approaches to primal-dual interior-point methods for SDPs 3. The central trajectory 4. Search directions 5. Various primal-dual interior-point methods 6. Exploiting sparsity 7. Software packages 8. SDPA sparse format 9. Numerical results 36

Optimization Technology Center http://www.ece.northwestern.edu/otc/ NEOS Server for Optimization http://www-neos.mcs.anl.gov/ NEOS Solvers http://www-neos.mcs.anl.gov/neos/solvers/index.html Semidefinite Programming software lang. method csdp c p-d ipm DSDP c dual scaling ipm pensdp matalb augmented Lagrangian sdpa c++ p-d ipm sdpa-c c++ p-d ipm, pd matrix completion sdplr c nlp algorithm sdpt3 matlab p-d ipm sedumi matlab p-d ipm, self-dual embedding Online solver submit your SDP problem through Internet. Binary and/or source codes are available. SDPA sparse format for all packages, matlab interface. 37

Some remarks on software packages. SDPs are more difficult to solve than LPs. Degeneracy, no interior points in primal or dual SDPs. Large scale problems. SDPs having free variables. More accuracy requires more cpu time. Some software package can solve SDPs faster with low accuracy. Sparse structure of SDPs. Some SDPs can be solved faster and/or more accurately by one package, but other SDPs by some other else. Try some software packages that fit your problem. SDPA Online Solver pre-open http://sdpara.r.dendai.ac.jp/portal/ SDPA on a single cpu. SDPARA on Opteron 32 cpu (1.4GHz), Xeon 16cpu (2.8GHz). SDPARA-C on Opteron 32 cpu (1.4GHz), Xeon 16cpu (2.8GHz). Submit your problem and choose one of the packages. 38

Part II: Primal-dual interior-point methods 1. Existing numerical methods for SDPs 2. Three different approaches to primal-dual interior-point methods for SDPs 3. The central trajectory 4. Search directions 5. Various primal-dual interior-point methods 6. Exploiting sparsity 7. Software packages 8. SDPA sparse format 9. Numerical results 39

SDPA sparse format. SDP (dual form): min 10y 1 + 20y 2 s.t. A 1 y 1 + A 2 y 2 A 0 O, where 1 0 0 0 5 0 0 0 0 0 0 0 A 0 = 0 2 0 0 0 0 3 0, A 1 = 0 6 0 0 0 0 0 0, A 2 = 0 7 0 0 0 0 8 1. 0 0 0 4 0 0 0 0 0 0 1 9 Example 2 =mdim 2 =nblocks 2 2 10.0 20.0 0 1 1 1 1.0 0 1 2 2 2.0 0 2 1 1 3.0 0 2 2 2 4.0 1 1 1 1 5.0 1 1 2 2 6.0 2 1 2 2 7.0 2 2 1 1 8.0 2 2 1 2-1.0 2 2 2 2 9.0 comment block structure objective coefficients nonzero coefficients of A p (p = 0, 1, 2); p block row column value, where row column 40

Part II: Primal-dual interior-point methods 1. Existing numerical methods for SDPs 2. Three different approaches to primal-dual interior-point methods for SDPs 3. The central trajectory 4. Search directions 5. Various primal-dual interior-point methods 6. Exploiting sparsity 7. Software packages 8. SDPA sparse format 9. Numerical results 41

A sparse SDP relaxation [44] of minimizing g.rosenbrock function: n ( f(x) = 100(xi x 2 i 1 )2 + (1 x i ) 2), x 1 0. i=2 SeDuMi. a single cpu, 2.4GHz Xeon. cpu in sec. n ɛ obj Sparse Dense 10 2.5e-08 0.2 10.6 15 6.5e-08 0.2 756.6 400 2.5e-06 3.7 800 5.5e-06 6.8 ɛ obj = the lower bound for opt. value the approx. opt. value. max{1, the lower bound for opt. value } When n = 800, SDP relaxation problem: A p : 4794 4794 (p = 1, 2,..., 7, 988) B : 7, 988 7, 988. Each A p consists of 799 diagonal blocks with 6 6 matrices. A p A q = 0 for most pairs (p, q) a sparse Chol. fact. of B. 42

P : min A 0 X sub.to A p X = b p (1 p m), S n X O D : max m p=1 b py p sub.to m p=1 A py p + S = A 0, S n S O SDPs from quantum chemistry, Fukuda et al. [13], Zhao et al. [51]. atoms/molecules m n #blocks the sizes of largest blocks O 7230 5990 22 [1450, 1450, 450,...] HF 15018 10146 22 [2520, 2520, 792,...] CH 3 N 20709 12802 22 [3211, 3211, 1014,...] number of processors 16 64 128 256 O elements of B 10100.3 2720.4 1205.9 694.2 Chol.fact. of B 218.2 87.3 68.2 106.2 total 14250.6 4453.3 3281.1 2951.6 HF elements of B * * 13076.1 6833.0 Chol.fact. of B * * 520.2 671.0 total * * 26797.1 20780.7 CH 3 N elements of B * * 34188.9 18003.3 Chol.fact. of B * * 1008.9 1309.9 total * * 57034.8 45488.9 43

P : min A 0 X sub.to A p X = b p (1 p m), S n X O D : max m p=1 b py p sub.to m p=1 A py p + S = A 0, S n S O Large-size SDPs by SDPARA-C [31] (64 CPUs) 3 types of test Problems: (a) SDP relaxations of randomly generated max. cut problems on lattice graphs with size 10 1000, 10 2000 and 10 4000. (b) SDP relaxations of randomly generated max. clique problems on lattice graphs with size 10 500, 10 1000 and 10 2000. (c) Randomly generated norm minimization problems min. F 10 0 F i y i sub.to y i R (i = 1, 2,..., 10) i=1 where F i : 10 9990, 10 19990 or 10 39990 and G = the square root of the max. eigenvalue of G T G. In all cases, the aggregate sparsity pattern consists of one block and is very sparse. 44

P : min A 0 X sub.to A p X = b p (1 p m), S n X O D : max m p=1 b py p sub.to m p=1 A py p + S = A 0, S n S O Large-size SDPs by SDPARA-C (64 CPUs) time memory Problem n m (s) (MB) Cut(10 1000) 10000 10000 274.3 126 (a) Cut(10 2000) 20000 20000 1328.2 276 Cut(10 4000) 40000 40000 7462.0 720 Clique(10 500) 5000 9491 639.5 119 (b) Clique(10 1000) 10000 18991 3033.2 259 Clique(10 2000) 20000 37991 15329.0 669 Norm(10 9990) 10000 11 409.5 164 (c) Norm(10 19990) 20000 11 1800.9 304 Norm(10 39990) 40000 11 7706.0 583 45

Part III: Some applications 1. Matrix approximation problems 2. A nonconvex quadratic optimization problem 3. The max-cut problem 4. Sum of squares of polynomials 1

Part III: Some applications 1. Matrix approximation problems 2. A nonconvex quadratic optimization problem 3. The max-cut problem 4. Sum of squares of polynomials 2

Let F p be an k l matrix (p = 0, 1,..., m). We want to approximate the matrix F 0 as a linear combination of F p (p = 1,..., m) or minimize { F (x) : x R m }, where F (x) = F 0 m p=1 F px p for x = (x 1,..., x m ) T. Which norm? A = max { A ij : i = 1,..., k, j = 1,..., l} (the norm) 1/2 k l A F = (the Frobenius norm) i=1 j=1 A 2 ij A L2 = max u 2 =1 Au = ( the maximum eigenvalue of A T A ) 1/2 (the L 2 operator norm). 3

Let F p be an k l matrix (p = 0, 1,..., m). We want to approximate the matrix F 0 as a linear combination of F p (p = 1,..., m) or minimize { F (x) : x R m }, where F (x) = F 0 m p=1 F px p for x = (x 1,..., x m ) T. Which norm? A = max { A ij : i = 1,..., k, j = 1,..., l} (the norm) minimize { F (x) : x R m } min ζ sub.to ζ F (x) ij ζ (i = 1,..., k, j = 1,..., l) LP (Linear Programming) 4

Let F p be an k l matrix (p = 0, 1,..., m). We want to approximate the matrix F 0 as a linear combination of F p (p = 1,..., m) or minimize { F (x) : x R m }, where F (x) = F 0 m p=1 F px p for x = (x 1,..., x m ) T. Which norm? A F = k i=1 l j=1 A 2 ij 1/2 (the Frobenius norm) minimize { F (x) F : x R m } k l minimize x R m F (x) 2 F F (x) 2 ij i=1 j=1 the least square problem, convex QP (quadratic Programming) 5

Let F p be an k l matrix (p = 0, 1,..., m). We want to approximate the matrix F 0 as a linear combination of F p (p = 1,..., m) or minimize { F (x) : x R m }, where F (x) = F 0 m p=1 F px p for x = (x 1,..., x m ) T. A L2 = max u 2 =1 Au = ( the maximum eigenvalue of A T A ) 1/2 (the L 2 operator norm) minimize { F (x) L2 : x R m} minimize the maximum eigenvalue of F (x) T F (x) minimize λ subject to λi F (x) T F (x) O minimize λ subject to the Schur complement ( ) I F (x) F (x) T O (SDP) λi 6

Part III: Some applications 1. Matrix approximation problems 2. A nonconvex quadratic optimization problem 3. The max-cut problem 4. Sum of squares of polynomials 7

An SDP relaxation of QOP QOP: min. x T Q 0 x + 2q T 0 x s.t. xt Q p x + 2q T p x + γ p 0 (p = 1,..., m). ( ) Q γp q T p = p, γ q p Q 0 = 0. p ( ) ( ) 1 x T 1 x T QOP: min. Q0 x xx T s.t. Qp x xx T 0 (p = 1,..., m). relaxation SDP-P: min. Let X ij ( = x i x ) j (i, j ( = 1,... ), n) or X = xx T ; 1 x T 1 x T hence = x X x xx T O (lifting) ( ) ( ) 1 x T 1 x T Q0 s.t. Qp 0 (p = 1,..., m), x X x X ( ) 1 x T O. x X x, X are variable in SDP-P. If x is a feas.sol. of QOP then (x, X) = (x, xx T ) is a feas.sol. of SDP-P with the same obj.val. relaxation; opt.val QOP opt.val SDP-P. If Q p (p = 0, 1,..., m) are positive semidefinite, then QOP = SDP-P. 8

The Lagrangian dual of QOP QOP: min. x T Q 0 x + 2q T 0 x s.t. xt Q p x + 2q T p x + γ p 0 (p = 1,..., m). Lagrangian function L(x, λ) = x T Q p x + 2q T p x + ( ) m i=1 λ i x T Q p x + 2q T p x + γ p, λ = (λ 1,..., λ m ) R m +. Lagrangian dual: max min{l(x, λ) : x R n } s.t. λ R m + L. dual: max. ζ s.t. L(x, λ) ζ 0 ( x R n ), λ R m + or L. dual: max. ζ m m ( ) T λ i γ i ζ q T 0 1 + λ i q T i ( ) s.t. i=1 i=1 1 x m m 0 ( x R n ), q 0 + λ i q i Q 0 + λ i Q x i λ R m + Here x is not a variable. i=1 9 i=0

SDP-D: max. ζ s.t. m m λ i γ i ζ q T 0 + λ i q T i i=1 q 0 + m λ i q i Q 0 + i=1 i=1 i=0 m λ i Q i O, λ Rm +. SDP-D above is the dual SDP of SDP-P, the SDP relaxation of the QOP in the previous example. 10

Part III: Some applications 1. Matrix approximation problems 2. A nonconvex quadratic optimization problem 3. The max-cut problem 4. Sum of squares of polynomials [14] M. X. Goemans and D. P. Williamson, Improved approximation algorithms for maximum cut and satisfiability problems using semidefinite programming, Journal of the ACM, 42 (1995) 1115-1145. 11