Infeasible Primal-Dual (Path-Following) Interior-Point Methods for Semidefinite Programming*

Similar documents
Infeasible Primal-Dual (Path-Following) Interior-Point Methods for Semidefinite Programming*

Exploiting Sparsity in Primal-Dual Interior-Point Methods for Semidefinite Programming.

2 The SDP problem and preliminary discussion

A priori bounds on the condition numbers in interior-point methods

Lecture: Algorithms for LP, SOCP and SDP

Solving large Semidefinite Programs - Part 1 and 2

A polynomial-time inexact primal-dual infeasible path-following algorithm for convex quadratic SDP

1 Outline Part I: Linear Programming (LP) Interior-Point Approach 1. Simplex Approach Comparison Part II: Semidenite Programming (SDP) Concludin

The Ongoing Development of CSDP

A PREDICTOR-CORRECTOR PATH-FOLLOWING ALGORITHM FOR SYMMETRIC OPTIMIZATION BASED ON DARVAY'S TECHNIQUE

Lecture: Introduction to LP, SDP and SOCP

I.3. LMI DUALITY. Didier HENRION EECI Graduate School on Control Supélec - Spring 2010

Fast Algorithms for SDPs derived from the Kalman-Yakubovich-Popov Lemma

Room 225/CRL, Department of Electrical and Computer Engineering, McMaster University,

SPARSE SECOND ORDER CONE PROGRAMMING FORMULATIONS FOR CONVEX OPTIMIZATION PROBLEMS

A Generalized Homogeneous and Self-Dual Algorithm. for Linear Programming. February 1994 (revised December 1994)

On implementing a primal-dual interior-point method for conic quadratic optimization

12. Interior-point methods

Interior Point Methods. We ll discuss linear programming first, followed by three nonlinear problems. Algorithms for Linear Programming Problems

Interior Point Methods in Mathematical Programming

Projection methods to solve SDP

Enlarging neighborhoods of interior-point algorithms for linear programming via least values of proximity measure functions

A SECOND ORDER MEHROTRA-TYPE PREDICTOR-CORRECTOR ALGORITHM FOR SEMIDEFINITE OPTIMIZATION

An Interior-Point Method for Approximate Positive Semidefinite Completions*

The Q Method for Symmetric Cone Programmin

Lagrange Duality. Daniel P. Palomar. Hong Kong University of Science and Technology (HKUST)

Semidefinite Programming, Combinatorial Optimization and Real Algebraic Geometry

6-1 The Positivstellensatz P. Parrilo and S. Lall, ECC

Introduction to Semidefinite Programs

Lecture 17: Primal-dual interior-point methods part II

Semidefinite Programming

m i=1 c ix i i=1 F ix i F 0, X O.

Introduction to Semidefinite Programming I: Basic properties a

Semidefinite Programming

ELE539A: Optimization of Communication Systems Lecture 15: Semidefinite Programming, Detection and Estimation Applications

A CONIC DANTZIG-WOLFE DECOMPOSITION APPROACH FOR LARGE SCALE SEMIDEFINITE PROGRAMMING

E5295/5B5749 Convex optimization with engineering applications. Lecture 5. Convex programming and semidefinite programming

Lecture 10. Primal-Dual Interior Point Method for LP

Constraint Reduction for Linear Programs with Many Constraints

PRIMAL-DUAL AFFINE-SCALING ALGORITHMS FAIL FOR SEMIDEFINITE PROGRAMMING

w Kluwer Academic Publishers Boston/Dordrecht/London HANDBOOK OF SEMIDEFINITE PROGRAMMING Theory, Algorithms, and Applications

Advances in Convex Optimization: Theory, Algorithms, and Applications

SDP Relaxations for MAXCUT

A semidefinite relaxation scheme for quadratically constrained quadratic problems with an additional linear constraint

On Generalized Primal-Dual Interior-Point Methods with Non-uniform Complementarity Perturbations for Quadratic Programming

Research Note. A New Infeasible Interior-Point Algorithm with Full Nesterov-Todd Step for Semi-Definite Optimization

Primal-Dual Geometry of Level Sets and their Explanatory Value of the Practical Performance of Interior-Point Methods for Conic Optimization

The Q Method for Second-Order Cone Programming

Determinant maximization with linear. S. Boyd, L. Vandenberghe, S.-P. Wu. Information Systems Laboratory. Stanford University

Interior-Point Methods

14. Duality. ˆ Upper and lower bounds. ˆ General duality. ˆ Constraint qualifications. ˆ Counterexample. ˆ Complementary slackness.

Alternating direction augmented Lagrangian methods for semidefinite programming

A SUFFICIENTLY EXACT INEXACT NEWTON STEP BASED ON REUSING MATRIX INFORMATION

Nonlinear Optimization for Optimal Control

A Continuation Approach Using NCP Function for Solving Max-Cut Problem

A FULL-NEWTON STEP INFEASIBLE-INTERIOR-POINT ALGORITHM COMPLEMENTARITY PROBLEMS

A full-newton step feasible interior-point algorithm for P (κ)-lcp based on a new search direction

Acyclic Semidefinite Approximations of Quadratically Constrained Quadratic Programs

Sparse Matrix Theory and Semidefinite Optimization

Interior Point Algorithms for Constrained Convex Optimization

12. Interior-point methods

Parallel implementation of primal-dual interior-point methods for semidefinite programs

Interior Point Methods for Mathematical Programming

10 Numerical methods for constrained problems

Second-order cone programming

Lecture Note 5: Semidefinite Programming for Stability Analysis

A direct formulation for sparse PCA using semidefinite programming

Research overview. Seminar September 4, Lehigh University Department of Industrial & Systems Engineering. Research overview.

On Superlinear Convergence of Infeasible Interior-Point Algorithms for Linearly Constrained Convex Programs *

א K ٢٠٠٢ א א א א א א E٤

Duality. Lagrange dual problem weak and strong duality optimality conditions perturbation and sensitivity analysis generalized inequalities

III. Applications in convex optimization

Alternating Direction Augmented Lagrangian Algorithms for Convex Optimization

Lecture 9 Sequential unconstrained minimization

Lecture 3 Interior Point Methods and Nonlinear Optimization

POLYNOMIAL OPTIMIZATION WITH SUMS-OF-SQUARES INTERPOLANTS

EE/ACM Applications of Convex Optimization in Signal Processing and Communications Lecture 4

A Constraint-Reduced Algorithm for Semidefinite Optimization Problems with Superlinear Convergence

6.854J / J Advanced Algorithms Fall 2008

Convexification of Mixed-Integer Quadratically Constrained Quadratic Programs

Limiting behavior of the central path in semidefinite optimization

SMO vs PDCO for SVM: Sequential Minimal Optimization vs Primal-Dual interior method for Convex Objectives for Support Vector Machines

Lecture 14 Barrier method

SVM May 2007 DOE-PI Dianne P. O Leary c 2007

4. Algebra and Duality

Acyclic Semidefinite Approximations of Quadratically Constrained Quadratic Programs

Research Reports on Mathematical and Computing Sciences

Conic Linear Optimization and its Dual. yyye

Numerical Comparisons of. Path-Following Strategies for a. Basic Interior-Point Method for. Revised August Rice University

The maximal stable set problem : Copositive programming and Semidefinite Relaxations

The Solution of Euclidean Norm Trust Region SQP Subproblems via Second Order Cone Programs, an Overview and Elementary Introduction

Convex Optimization and l 1 -minimization

Implementation and Evaluation of SDPA 6.0 (SemiDefinite Programming Algorithm 6.0)

Sparse Optimization Lecture: Dual Certificate in l 1 Minimization

Lecture 3: Lagrangian duality and algorithms for the Lagrangian dual problem

Lecture 3: Semidefinite Programming

ORF 523 Lecture 9 Spring 2016, Princeton University Instructor: A.A. Ahmadi Scribe: G. Hall Thursday, March 10, 2016

Chapter 6 Interior-Point Approach to Linear Programming

Summer School: Semidefinite Optimization

The Simplest Semidefinite Programs are Trivial

Transcription:

Infeasible Primal-Dual (Path-Following) Interior-Point Methods for Semidefinite Programming* Yin Zhang Dept of CAAM, Rice University Outline (1) Introduction (2) Formulation & a complexity theorem (3) Computation and numerical results (4) Comments * On extending primal-dual interior-point algorithms from LP to SDP, SIOPT (1998). 1

Primal SDP: min C X s.t. A i X = b i, i = 1 : m X 0, where C X = tr(c T X), A i, C symmetric, and X 0 means symmetric positive semidefinite. Diagonal C, A i LP SDP constraints are nonlinear: X 2 2 0 {x 11 0, x 11 x 22 x 2 12 } Dual SDP: max s.t. m i=1 b y y i A i + Z = C Z 0 2

Strong Duality: C X b y = X Z = 0 if both primal and dual are strictly feasible (Slater constraint qualification). Note: For X, Z 0, X Z = 0 XZ = 0. Proof: is trivial. X Z = tr(xz) = 0 tr(xz 1/2 Z 1/2 ) = 0 tr(z 1/2 XZ 1/2 ) = 0 λ i (Z 1/2 XZ 1/2 ) = 0 λ i (Z 1/2 XZ 1/2 ) = 0, i Z 1/2 XZ 1/2 = 0 (symmetry) (Z 1/2 X 1/2 )(Z 1/2 X 1/2 ) T = 0 Z 1/2 X 1/2 = 0 ZX = 0. 3

SDP Applications: Eigenvalue optimization Control theory Linear matrix inequalities in system and control theory Book by Boyd et al, SIAM, 1994. Combinatorial optimization Example: Max-Cut relaxation Non-convex optimization Example: QCQP Relaxations 4

Definitions: Interior-point: (X, y, Z) : X, Z 0 Infeasible Alg: allows infeasible iterates Primal-dual Alg: perturbed and damped Newton (or composite Newton) method applied to a form of optimality condition system. Standard Optimality Conditions: (X, y, Z), X, Z 0 such that i A i X b i = 0 m yi A i + Z C = 0 n(n + 1)/2 XZ = 0 n 2 This optimality system is not square. Complementarity needs symmetrization. Weighted symmetrization: (YZ 95) H P (M) = (PMP 1 + P T M T P T )/2 XZ = τi H P (XZ) = τi complementarity H P (XZ) = 0 the central path H P (XZ) = µi 5

H P unifies a family of primal-dual formulations. 3 most promising members: AHO (Alizadeh/Haeberly/Overton 94): P = I XZ + ZX = 0 KSH (Kojima et al 94, Monteiro 95): P k = [Z k ] 1/2 Z k XZ + ZXZ k = 0 NT (Nesterov/Todd 95): W = X.5 (X.5 ZX.5 ).5 X.5, P k = [W k ].5 A sequence of equiv. systems for varying P k : Formulation AHO KSH NT Real Newton? Yes No No AHO difficult to analyze its complexity KSH Complexity results were obtained for feasible or short-step methods In favor of: infeasible long-step methods implementable and most efficient in practice 6

Central path: {(X, Z) 0 : XZ = µi 0} On the central path: λ i (XZ) = µ, i λ 2 (XZ) central path λ 1 (XZ) Short-step uses a narrow neighborhood. Long-step uses a wide neighborhood. 7

A narrow neighborhood: α (0, 1) N α = {(X, Z) 0 : XZ µi F αµ} where µ is the normalized duality gap µ = tr(xz)/n. A wide neighborhood: γ (0, 1) N γ = {(X, Z) 0 : λ min (XZ) γµ} where µ = tr(xz)/n. A neighborhood with feasibility priority: N γ = { (X, Z) N γ : r r 0 µ µ 0 where r measures infeasibility. } 8

Column-Stacking Operator vec : R n n R n2 vec [ 1 3 2 4 ] = (1 2 3 4) T SDP: Find v = (X, y, Z), X, Z 0, such that F k (v) = A T y + vecz vecc A(vecX) b vec(z k XZ + ZXZ k ) where A T = [veca 1 veca m ]. = 0, Infeasible long-step Algorithm: (YZ 95) Choose v 0 = (X 0, y 0, Z 0 ), X 0, Z 0 0, For k = 0,1,2,..., until µ k 2 L do (1) solve F k (vk ) v = F k (v k ) + σµ k p k (2) choose α, v k+1 v k + α v N γ End where σ (0,1), and the centering term p k is derived from Z k XZ + ZXZ k 2σµ k Z k = 0. 9

The key to convergence analysis is to estimate the 2nd-order term. Key Lemma: (YZ 95) Under the conditions that (1) a solution (X, y, Z ) exists, (2) X 0 = Z 0 = ρi for ρ max { λ 1 (X ), λ 1 (Z ), tr(x + Z ) n } ( ) (3) (X, Z) N γ, Z2( X Z)Z 1 1 λ1 (XZ) tr(xz) 2 F 19n λ n (XZ) γ Polynomial Complexity: (YZ 95) Theorem: Under the conditions (1)-(3), the algorithm terminates (µ k 2 L ) in at most O(n 2.5 L)-iterations. Global convergence holds without (*) in (2). 10

Computation: How difficult is it to solve SDP? Distinctions from LP: (1) Special code for special prob., e.g. LP (2) Cost for linear system varies with method (3) Parallelism more important than sparsity Linear system F k (vk ) v = F k (v k ) + σµ k p k : 0 A T I A 0 0 E 0 F vec X y vec Z = E and F are no longer diagonal. vecr d r p vecr c Solution Procedure: (1) M = A(E 1 F)A T. (2) M y = r p + A[E 1 (FvecR d vecr c )]. (3) Z = R d m i=1 y i A i. (4) X = σµz 1 X H(X( Z)Z 1 ). Most computation is in steps (1)-(2). M is m m, m [0, n(n + 1)/2]., 11

Consider forming: M = A(E 1 F)A T Notation: Kronecker product U V = [u ij V ]) KSH: E = Z Z, F = (ZX I + I ZX)/2 M 0, M ij = tr(z 1 A i XA j ) AHO: E = Z I + I Z, F = X I + I X M asymmetric (LU) ZP i + P i Z = A i Lyapunov eqn for P i M ij = tr(p i (XA j + A j X)) NT: W = X.5 (X.5 ZX.5 ).5 X.5 E = I I, F = W W M 0, M ij = tr(a i WA j W) 2 Cholesky + SVD for W (TTT 96) work(aho) > work(nt) > work(ksh) 12

Leading Cost* per Iter for Dense Problems (Form: O(m(m + n)n 2 ), Solve O(m 3 )) m = O(1) O(n) O(n 2 ) Form O(n 3 )* O(n 4 )* O(n 6 ) Solve O(1) O(n 3 ) O(n 6 )* Forming M is most expensive. Bad news: O(n 4 ) per iteration most likely Good news: Rich parallelism Coarse grain: compute M ij independently Fine grain: rich in matrix multiplications Expect (almost) linear speed-up 13

Preliminary Implementation: Predictor-Corrector Algorithm: Choose v 0 = (X 0, y 0, Z 0 ), X 0, Z 0 0, For k = 0,1,2,..., until µ k 2 L do (1) F k (vk ) v p = F k (v k ). Choose σ k. (2) F k (vk ) v c = F k (v k + v p ) + σ k µ k p k (3) Let v = v p + v c (4) choose α, v k+1 v k + α v (interior) End (Extension of Mehrotra s framework to SDP) implemented in MATLAB for general problems, dense or sparse only KSH formulation at this moment backtracking for α using Cholesky no stabilizing measure yet at the end initial point X 0 = Z 0 = ni tested on randomly generated problems run on SGI workstation (MIPS/R8000) 14

Max-Cut: n = 50 (general code) Iter. No P_inf D_inf Gap Cpu ------------------------------------------ iter 0: 4.29e+01 1.84e+02 6.38e+03 0.21 iter 1: 2.13e+01 1.16e-14 1.85e+03 0.30 iter 2: 1.06e-14 1.05e-14 2.73e+02 0.32 iter 3: 5.45e-15 1.05e-14 1.53e+02 0.30 iter 4: 2.49e-15 1.34e-14 2.05e+01 0.32 iter 5: 4.01e-15 1.33e-14 3.95e+00 0.31 iter 6: 6.14e-14 1.23e-14 3.53e-01 0.32 iter 7: 7.97e-13 1.23e-14 3.39e-02 0.31 iter 8: 2.83e-11 1.18e-14 7.43e-03 0.30 iter 9: 6.16e-10 1.41e-14 2.59e-04 0.30 iter 10: 6.25e-09 1.27e-14 7.96e-06 Conv ------------------------------------------ [m n] = [ 50 50] mflops = 2.39e+01 CPU: form 1.86, factor 0.04, total 3.11. Cpu/iter can be cut to 0.05 if using the special structure diag(x) = e A i = e i e T i 15

Max-Cut: n = 200 (general code) Iter. No P_inf D_inf Gap Cpu ------------------------------------------ iter 0: 1.86e+02 1.20e+03 1.97e+05 3.72 iter 1: 9.22e+01 9.40e-14 7.48e+04 12.56 iter 2: 7.58e-14 1.04e-13 3.51e+03 12.66 iter 3: 1.25e-14 9.10e-14 9.08e+02 13.00 iter 4: 8.53e-15 1.03e-13 5.94e+02 13.04 iter 5: 6.21e-15 1.14e-13 1.09e+02 13.18 iter 6: 8.23e-15 9.29e-14 8.52e+00 13.00 iter 7: 1.70e-12 1.07e-13 9.40e-01 12.66 iter 8: 3.84e-11 8.79e-14 9.82e-02 12.89 iter 9: 3.23e-08 9.79e-14 1.66e-02 13.24 iter 10: 1.12e-07 1.01e-13 7.80e-03 12.63 iter 11: 2.09e-08 8.87e-14 8.22e-04 conv ------------------------------------------ [m n] = [200 200] mflops = 1.61e+03 CPU: form 110.03, factor 0.73, total 133.08 Note: deteriorating Primal feasibility 16

General Problem: m = 50, n = 200 Iter. No P_inf D_inf Gap Cpu -------------------------------------------- iter 0: 3.34e+03 2.40e+02 5.65e+02 4.78 iter 1: 6.49e-12 8.05e+01 1.48e+03 255.71 iter 2: 2.65e-11 8.43e+00 5.31e+02 255.67 iter 3: 2.01e-11 1.05e+00 1.30e+02 256.10 iter 4: 2.38e-11 1.55e-01 2.57e+01 255.99 iter 5: 2.55e-11 1.97e-02 3.97e+00 256.34 iter 6: 2.77e-11 2.36e-03 6.18e-01 256.23 iter 7: 1.89e-10 2.36e-04 5.46e-01 256.16 iter 8: 4.40e-10 6.62e-05 2.21e-01 255.99 iter 9: 4.26e-09 1.85e-05 2.61e-02 256.16 iter 10: 7.18e-09 1.85e-06 2.45e-03 255.85 iter 11: 1.24e-07 1.85e-07 4.55e-04 Conv -------------------------------------------- [m n] = [ 50 200] mflops = 3.11e+03 CPU: form 2528.25, factor 0.04, total 2566 Note: Almost all time spent on forming M 17

Final remarks: Superlinear convergence: Proved for tangential convergence along the central path, expensive to enforce. Kojima et al (95) Potra and Sheng (95) Luo, Sturm and S. Zhang (96) Trouble: off-diag(xz) 0 slower than diag(xz) 0 unless XZ µi 0. Is it worth the effort? Parallel implementation: A necessity for solving large-scale problems 18