Projection methods to solve SDP

Similar documents
Solving large Semidefinite Programs - Part 1 and 2

Copositive Programming and Combinatorial Optimization

A solution approach for linear optimization with completely positive matrices

Copositive Programming and Combinatorial Optimization

Modeling with semidefinite and copositive matrices

Lecture 3: Lagrangian duality and algorithms for the Lagrangian dual problem

12. Interior-point methods

Convex Optimization. Newton s method. ENSAE: Optimisation 1/44

I.3. LMI DUALITY. Didier HENRION EECI Graduate School on Control Supélec - Spring 2010

PENNON A Generalized Augmented Lagrangian Method for Convex NLP and SDP p.1/39

Convex Optimization M2

5. Duality. Lagrangian

Semidefinite Programming, Combinatorial Optimization and Real Algebraic Geometry

Optimization and Root Finding. Kurt Hornik

The Q Method for Symmetric Cone Programmin

Lecture: Duality.

Written Examination

A CONIC DANTZIG-WOLFE DECOMPOSITION APPROACH FOR LARGE SCALE SEMIDEFINITE PROGRAMMING

1 Computing with constraints

Convex Optimization Boyd & Vandenberghe. 5. Duality

10 Numerical methods for constrained problems

Eigenvalue Optimization for Solving the

EE 227A: Convex Optimization and Applications October 14, 2008

Parallel implementation of primal-dual interior-point methods for semidefinite programs

Homework Set #6 - Solutions

Sparse Optimization Lecture: Basic Sparse Optimization Models

Interior Point Algorithms for Constrained Convex Optimization

A priori bounds on the condition numbers in interior-point methods

minimize x subject to (x 2)(x 4) u,

Lecture 9 Sequential unconstrained minimization

A nonlinear programming algorithm for solving semidefinite programs via low-rank factorization

PENNON A Generalized Augmented Lagrangian Method for Nonconvex NLP and SDP p.1/22

Newton s Method. Ryan Tibshirani Convex Optimization /36-725

MS&E 318 (CME 338) Large-Scale Numerical Optimization

Lecture: Duality of LP, SOCP and SDP

Homework 4. Convex Optimization /36-725

A sensitivity result for quadratic semidefinite programs with an application to a sequential quadratic semidefinite programming algorithm

A Fast Augmented Lagrangian Algorithm for Learning Low-Rank Matrices

ICS-E4030 Kernel Methods in Machine Learning

Lagrange Duality. Daniel P. Palomar. Hong Kong University of Science and Technology (HKUST)

Introduction to Semidefinite Programming I: Basic properties a

CS-E4830 Kernel Methods in Machine Learning

Second-order cone programming

Miscellaneous Nonlinear Programming Exercises

Infeasible Primal-Dual (Path-Following) Interior-Point Methods for Semidefinite Programming*

Introduction to Nonlinear Stochastic Programming

Optimization for Machine Learning

12. Interior-point methods

Motivation. Lecture 2 Topics from Optimization and Duality. network utility maximization (NUM) problem:

10725/36725 Optimization Homework 4

AM 205: lecture 19. Last time: Conditions for optimality, Newton s method for optimization Today: survey of optimization methods

Constrained Optimization and Lagrangian Duality

Lecture Note 5: Semidefinite Programming for Stability Analysis

LINEAR AND NONLINEAR PROGRAMMING

Infeasible Primal-Dual (Path-Following) Interior-Point Methods for Semidefinite Programming*

w Kluwer Academic Publishers Boston/Dordrecht/London HANDBOOK OF SEMIDEFINITE PROGRAMMING Theory, Algorithms, and Applications

March 8, 2010 MATH 408 FINAL EXAM SAMPLE

Uses of duality. Geoff Gordon & Ryan Tibshirani Optimization /

We describe the generalization of Hazan s algorithm for symmetric programming

AM 205: lecture 19. Last time: Conditions for optimality Today: Newton s method for optimization, survey of optimization methods

Lecture: Introduction to LP, SDP and SOCP

Lecture 14: Optimality Conditions for Conic Problems

Convex Optimization & Lagrange Duality

A Continuation Approach Using NCP Function for Solving Max-Cut Problem

Fidelity and trace norm

Some Inexact Hybrid Proximal Augmented Lagrangian Algorithms

Extreme Abridgment of Boyd and Vandenberghe s Convex Optimization

EE/ACM Applications of Convex Optimization in Signal Processing and Communications Lecture 17

Nonlinear Optimization for Optimal Control

Primal-dual Subgradient Method for Convex Problems with Functional Constraints

An inexact block-decomposition method for extra large-scale conic semidefinite programming

An inexact block-decomposition method for extra large-scale conic semidefinite programming

Linear and non-linear programming

SMO vs PDCO for SVM: Sequential Minimal Optimization vs Primal-Dual interior method for Convex Objectives for Support Vector Machines

5.5 Quadratic programming

Research overview. Seminar September 4, Lehigh University Department of Industrial & Systems Engineering. Research overview.

An Inexact Newton Method for Nonlinear Constrained Optimization

Applications of Linear Programming

On the Local Quadratic Convergence of the Primal-Dual Augmented Lagrangian Method

Lecture 1. 1 Conic programming. MA 796S: Convex Optimization and Interior Point Methods October 8, Consider the conic program. min.

Semidefinite and Second Order Cone Programming Seminar Fall 2001 Lecture 5

Nonlinear Optimization: What s important?

subject to (x 2)(x 4) u,

Differentiable exact penalty functions for nonlinear optimization with easy constraints. Takuma NISHIMURA

LECTURE 25: REVIEW/EPILOGUE LECTURE OUTLINE

ALADIN An Algorithm for Distributed Non-Convex Optimization and Control

Nonlinear Programming

Unconstrained optimization

Big Data Analytics: Optimization and Randomization

DELFT UNIVERSITY OF TECHNOLOGY

Duality revisited. Javier Peña Convex Optimization /36-725

Primal/Dual Decomposition Methods

5 Handling Constraints

Penalty and Barrier Methods. So we again build on our unconstrained algorithms, but in a different way.

Lift me up but not too high Fast algorithms to solve SDP s with block-diagonal constraints

Algorithms for Constrained Optimization

The Ongoing Development of CSDP

A CONIC INTERIOR POINT DECOMPOSITION APPROACH FOR LARGE SCALE SEMIDEFINITE PROGRAMMING

Shiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 9. Alternating Direction Method of Multipliers

Lecture: Cone programming. Approximating the Lorentz cone.

Transcription:

Projection methods to solve SDP Franz Rendl http://www.math.uni-klu.ac.at Alpen-Adria-Universität Klagenfurt Austria F. Rendl, Oberwolfach Seminar, May 2010 p.1/32

Overview Augmented Primal-Dual Method Boundary Point Method F. Rendl, Oberwolfach Seminar, May 2010 p.2/32

Semidefinite Programs max{ C, X : A(X) =b, X 0} = min{b T y : A T (y) C = Z 0} Some notation and assumptions: X, Z symmetric n n matrices The linear equations A(X) =b read A i,x = b i for given symmetric matrices A i,i=1,...,m. The adjoint map A T is given by A T (y) = y i A i. We assume that both the primal and the dual problem have strictly feasible points (X, Z 0), so that strong duality holds, and optima are attained. F. Rendl, Oberwolfach Seminar, May 2010 p.3/32

Optimality conditions Under these conditions, (X, y, Z) is optimal if and only if the following conditions hold: A(X) =b, X 0, primal feasibility A T (y) Z = C, Z 0, dual feasibility X, Z =0 complementarity. Last condition is equivalent to C, X = b T y. It could also be replaced by the matrix equation ZX =0. F. Rendl, Oberwolfach Seminar, May 2010 p.4/32

Other solution approaches Spectral Bundle method, see Helmberg, Rendl: SIOPT (2000): works on dual problem as eigenvalue optimization problem. Low-Rank factorization, see Burer, Monteiro: Math Prog (2003): express X = LL T and work with L. Leadsto nonlinear optimization techniques. Iterative solvers for augmented system, see Toh:SIOPT (2004): use iterative methods to solve Newton system. Iterative solvers and modified barrier approach, see Kocvara, Stingl: Math Prog (2007): strong computational results using the package PENNSDP. and many other methods: sorry for not mentioning them all F. Rendl, Oberwolfach Seminar, May 2010 p.5/32

Other solution approaches Spectral Bundle method Low-Rank factorization Iterative solvers for augmented system, Toh (2004) Iterative solvers and modified barrier approach, Kocvara, Stingl (2007) Methods based on projection boundary point approach: (Povh, R., Wiegele: Computing 2006) regularization methods: Malick, Povh, R., Wiegele, 2009 augmented primal-dual approach: (Jarre, R.: SIOPT 2009) F. Rendl, Oberwolfach Seminar, May 2010 p.6/32

Comparing IP and projection methods constraint IP BPM APD A(X) =b yes *** yes X 0 yes yes *** A T (y) C = Z yes *** yes Z 0 yes yes *** Z, X =0 yes ZX =0 *** yes IP: Interior-point approach BPM: boundary point method APD: augmented primal-dual method ***: means that once this condition is satisfied, the method stops. F. Rendl, Oberwolfach Seminar, May 2010 p.7/32

Augmented Primal-Dual Method (This is joint work with Florian Jarre.) FP := {X : A(X) = b} primal linear space, FD := {(y, Z) :Z = C + A T (y)} dual linear space OPT := {(X, y, Z); C, X = b T y} optimality hyperplane. From Linear Algebra: ( ) Π FP (X) =X A T (AA T ) 1 (A(X) b), ( ) Π FD (Z) =C + A T (AA T ) 1 (A(Z C)) are the projections of (X, Z) onto FP and FD. F. Rendl, Oberwolfach Seminar, May 2010 p.8/32

Augmented Primal-Dual Method (2) Note that both projections essentially need one solve with matrix AA T.(Needs to be factored only once.) Projection onto OPT is trivial. Let K = FP FD OPT. Given (X, y, Z), the projection Π K (X, y, Z) onto K requires two solves. This suggests the following iteration: Start: Select (X, y, Z) K Iteration: while not optimal X + = Π SDP (X), Z + = Π SDP (Z). (X, y, Z) Π K (X +,y,z + ) The projection Π SDP (X) of X onto SDP can be computed through an eigenvalue decomposition of X. F. Rendl, Oberwolfach Seminar, May 2010 p.9/32

Augmented Primal-Dual Method (3) This approach converges, but possibly very slowly. The computational effort is two solves (order m) and two factorizations (order n). An improvement: Consider φ(x, Z) :=dist(x, SDP) 2 + dist(z, SDP) 2. Here dist(x, SDP) denotes the distance of the matrix X from the cone of semidefinite matrices. The (convex) function φ is differentiable with Lipschitz-continuous gradient: φ(x, Z) =(X, Z) Π K (Π SDP (X, Z)) We solve SDP by minimizing φ over K. F. Rendl, Oberwolfach Seminar, May 2010 p.10/32

Augmented Primal-Dual Method (4) Practical implementation currently under investigation. The function φ could be modified by φ(x, Z)+ XZ 2 F Apply some sort of conjugate gradient approach (Polak-Ribiere) to minimize this function. Computational work: Projection onto K done by solving a system with matrix AA T. Evaluating φ involves spectral decomposition of X, Z. This approach is feasible if n not too large (n 1000), and if linear system with AA T can be solved. F. Rendl, Oberwolfach Seminar, May 2010 p.11/32

Augmented Primal-Dual Method (5) Recall: (X, y, Z) is optimal once X, Z 0. A typical run: n =400,m= 10000. iter secs C, X λ min (X) λ min (Z) 1 9.7 11953.300-0.00209-0.00727 10 55.8 11942.955-0.00036-0.00055 20 103.8 11948.394-0.00013-0.00015 30 150.7 11950.799-0.00007-0.00005 40 196.7 11951.676-0.00005-0.00002 50 242.6 11951.781-0.00004-0.00001 The optimal value is 11951.726. F. Rendl, Oberwolfach Seminar, May 2010 p.12/32

Random SDP n m opt apd λ min 400 40000-114933.8-114931.1-0.0002 500 50000-47361.2-47353.4-0.0003 600 60000 489181.8 489194.5-0.0004 700 70000-364458.8-364476.1-0.0004 800 80000-112872.6-112817.4-0.0011 1000 100000 191886.2 191954.5-0.0012 50 iterations of APD. Largest instance takes about 45 minutes. λ min is most negative eigenvalue of X and Z. F. Rendl, Oberwolfach Seminar, May 2010 p.13/32

Boundary Point method Augmented Lagrangian for (D) min{b T y : A T (y) C = Z 0}. X...Lagrange Multiplier for dual equations σ > 0 penalty parameter L σ (y, Z, X) =b T y + X, Z + C A T (y) + σ 2 Z + C AT (y) 2 Generic Method: repeat until convergence (a) Keep X fixed: solve min y,z 0 L σ (y, Z, X) to get y, Z 0 (b) update X: X X + σ(z + C A T (y)) (c) update σ Original version: Powell, Hestenes (1969) σ carefully selected gives linear convergence F. Rendl, Oberwolfach Seminar, May 2010 p.14/32

Inner Subproblem Inner minimization: X and σ are fixed. W (y) :=A T (y) C 1 σ X L σ = b T y + X, Z + C A T (y) + σ 2 Z + C AT (y) 2 = = b T y + σ 2 Z W (y) 2 + const = f(y, Z) + const. Note that dependence on Z looks like projection problem, but with additional variables y. Altogether this is convex quadratic SDP! F. Rendl, Oberwolfach Seminar, May 2010 p.15/32

Optimality conditions (1) Introduce Lagrange multiplier V 0 for Z 0: L(y, Z, V )=f(y, Z) V,Z Recall: f(y, Z) =b T y + σ 2 Z W (y) 2, W(y) =A T (y) C 1 σ X. y L =0gives σaa T (y) =σa(z + C)+A(X) b, Z L =0gives V = σ(z W (y)), V 0, Z = 0, VZ =0. Since Slater constraint qualification holds, these are necessary and sufficient for optimality. F. Rendl, Oberwolfach Seminar, May 2010 p.16/32

Optimality conditions (2) Note also: For y fixed we get Z by projection: Z = W (y) +. From matrix analysis: W = W + + W, W + 0, W 0, W +,W =0. We have: (y, Z, V ) is optimal if and only if: AA T (y) = 1 (A(X) b)+a(z + C), σ Z = W (y) +, V = σ(z W (y)) = σw (y). Solve linear system (of order m) to get y. Compute eigenvalue decomposition of W (y) (order n). Note that AA T does not change during iterations. F. Rendl, Oberwolfach Seminar, May 2010 p.17/32

Boundary Point Method Start: σ > 0,X 0,Z 0 repeat until Z A T (y)+c ɛ: repeat until A(V ) b σɛ (X, σ fixed): - Solve for y: AA T (y) =rhs - Compute Z = W (y) +, V = σw (y) Update X : X = σw (y) Inner stopping condition is primal feasibility. Outer stopping condition is dual feasibility. See: Povh, R, Wiegele (Computing, 2006) F. Rendl, Oberwolfach Seminar, May 2010 p.18/32

Theta: big DIMACS graphs graph n m ϑ ω keller5 776 74.710 31.00 27 keller6 3361 1026.582 63.00 59 san1000 1000 249.000 15.00 15 san400-07.3 400 23.940 22.00 22 brock400-1 400 20.077 39.70 27 brock800-1 800 112.095 42.22 23 p-hat500-1 500 93.181 13.07 9 p-hat1000-3 1000 127.754 84.80 68 p-hat1500-3 1500 227.006 115.44 94 see Malick, Povh, R., Wiegele (2008): The theta number for the bigger instances has not been computed before. F. Rendl, Oberwolfach Seminar, May 2010 p.19/32

Random SDP n m secs iter secs chol(aa ) 300 5000 43 168 1 300 10000 158 229 56 400 10000 130 211 8 400 20000 868 204 593 500 10000 144 136 1 500 20000 431 205 140 600 10000 184 96 1 600 20000 345 155 23 600 30000 975 152 550 800 40000 1298 155 345 relative accuracy of 10 5, coded in MATLAB. F. Rendl, Oberwolfach Seminar, May 2010 p.20/32

Conclusions and References Both methods need more theoretical convergence analysis. Speed-up possible making use of limited-memory BFGS type methods. The spectral decomposition limits the matrix size n. Practical convergence may vary greatly depending on data. 3 papers: Povh, R., Wiegele: Boundary point method (Computing 2006) Malick, Povh, R., Wiegele: (SIOPT 2009) Jarre, R.:, Augmented primal-dual method, (SIOPT 2008) F. Rendl, Oberwolfach Seminar, May 2010 p.21/32

Large-Scale SDP Projection methods like the boundary point method assume that a full spectral decomposition is computationally feasible. This limits n to n 2000 but m could be arbitrary. What if n is much larger? F. Rendl, Oberwolfach Seminar, May 2010 p.22/32

Spectral Bundle Method What if m and n is large? In addition to before, we now assume that working with symmetric matrices X of order n is too expensive (no Cholesky, no matrix multiplication!) One possibility: Get rid of Z 0 by using eigenvalue arguments. F. Rendl, Oberwolfach Seminar, May 2010 p.23/32

Constant trace SDP A has constant trace property if I is in the range of A T, equivalently η such that A T (η) =I The constant trace property implies: A(X) =b, A T (η) =I then tr(x) = I,X = η,a(x) = η T b =: a Constant trace property holds for many combinatorially derived SDP! F. Rendl, Oberwolfach Seminar, May 2010 p.24/32

Reformulating Constant Trace SDP Reformulate dual as follows: min{b T y : A T (y) C = Z 0} Adding (redundant) primal constraint tr(x) =a introduces new dual variable, say λ, anddualbecomes: min{b T y + aλ : A T (y) C + λi = Z 0} At optimality, Z is singular, hence λ min (Z) =0. Will be used to compute dual variable λ explicitely. F. Rendl, Oberwolfach Seminar, May 2010 p.25/32

Dual SDP as eigenvalue optimization Compute dual variable λ explicitely: λ max ( Z) =λ max (C A T (y)) λ =0, λ = λ max (C A T (y)) Dual equivalent to min{a λ max (C A T (y)) + b T y : y R m } This is non-smooth unconstrained convex problem in y. Minimizing f(y) =λ max (C A T (y)) + b T y: Note: Evaluating f(y) at y amounts to computing largest eigenvalue of C A T (y). Can be done by iterative methods for very large (sparse) matrices. F. Rendl, Oberwolfach Seminar, May 2010 p.26/32

Spectral Bundle Method (1) If we have some y, how do we move to a better point? Define λ max (X) = max{ X, W : tr(w )=1,W 0} L(W, y) := C A T (y),w + b T y. Then f(y) = max{l(w, y) : tr(w )=1,W 0}. Idea 1: Minorant for f(y) Fix some m k matrix P. k 1 can be chosen arbitrarily. The choice of P will be explained later. Consider W of the form W = PVP T with new k k matrix variable V. ˆf(y) := max{l(w, y) : W = PVP T,V 0} f(y) F. Rendl, Oberwolfach Seminar, May 2010 p.27/32

Spectral Bundle Method (2) Idea 2: Proximal point approach The function ˆf depends on P and will be a good approximation to f(y) only in some neighbourhood of the current iterate ŷ. Instead of minimizing f(y) we minimize ˆf(y)+ u 2 y ŷ 2. This is a strictly convex function, if u>0 is fixed. Substitution of definition of ŷ gives the following min-max problem F. Rendl, Oberwolfach Seminar, May 2010 p.28/32

Quadratic Subproblem (1) min y max W L(W, y)+u 2 y ŷ 2 =... = max L(W, y)+ u y ŷ 2 W, y=ŷ+ 1 (A(W ) b) 2 u = max C W AT (ŷ),w + b T ŷ 1 A(W ) b, A(W ) b. 2u Note that this is a quadratic SDP in the k k matrix V, because W = PVP T. k is user defined and can be small, independent of n!! F. Rendl, Oberwolfach Seminar, May 2010 p.29/32

Quadratic Subproblem (2) Once V is computed, we get with W = PVP T that y =ŷ + u 1 (A(W ) b) see: Helmberg, Rendl: SIOPT 10, (2000), 673ff Update of P : Having new point y, we evaluate f at y (sparse eigenvalue computation), which produces also an eigenvector v to λ max. The vector v is added as new column to P,andP is purged by removing unnecessary other columns. Convergence is slow, once close to optimum solve quadratic SDP of size k compute λ max of matrix of order n F. Rendl, Oberwolfach Seminar, May 2010 p.30/32

Last Slide Interior Point methods are fine and work robustly, but n 1000 and m 10, 000 is a severe limit. If n small enough for matrix operations (n 2, 000), then projection methods allow to go to large m. These algorithms have weaker convergence properties and need some nontrivial parameter tuning. Partial Lagrangian duality can always be used to deal with only a part of the constraints explicitely. But we still need to solve some basic SDP and convergence of bundle methods for the Lagrangian dual may be slow. Currently, only spectral bundle is suitable as a general tool for very-large scale SDP. F. Rendl, Oberwolfach Seminar, May 2010 p.31/32