A CONIC DANTZIG-WOLFE DECOMPOSITION APPROACH FOR LARGE SCALE SEMIDEFINITE PROGRAMMING

A CONIC DANTZIG-WOLFE DECOMPOSITION APPROACH FOR LARGE SCALE SEMIDEFINITE PROGRAMMING Kartik Krishnan Advanced Optimization Laboratory McMaster University Joint work with Gema Plaza Martinez and Tamás Terlaky Computational Mathematics Seminar University of Waterloo January 24, 2005

Contents ˆ Conic programming ˆ Motivation and Previous work ˆ Semidefinite programming ˆ Conic Dantzig-Wolfe decomposition ˆ Algorithm ˆ Implementational Issues in Algorithm ˆ Computational results ˆ Conclusions and Future work -1-

Conic programming (P) min c T x s.t. Ax = b x K (D) max b T y s.t. A T y + s = c s K where A R m n, b R m, c R n, K = K 1... K r ˆ r = 1, K = R n + = {x R n : x 0} LP Very large LPs (m, n 1,000,000) solvable by the simplex method and/or IPMs. ˆ K i = Q n i + = {x R n i : x 1 x 2:ni } SOCP Large SOCPs (m, n 100,000) solvable by IPMs. ˆ K i = S n i + = {X S n i : X 0} SDP Medium sized SDPs (m, n 1000) solvable by IPMs. (Beyond 10,000 seems impossible today!) -2-

Applications of conic programming 1. Combinatorial optimization: Alizadeh, Goemans-Williamson, Lovasz-Schrijver, Ye. 2. Linear control theory: Boyd, Vandenberghe, Balakrishnan, El Ghaoui, Feron. 3. Signal Processing: Luo. 4. Robust optimization: Ben Tal-Nemirovski, El Ghaoui. 5. Polynomial programming: Bertsimas, Lasserre, Nesterov, Parrilo. 6. Others include finance, statistics, moment problems etc. -3-

Motivation (a) Solve large scale semidefinite programs (SDP) arising in science and engineering. (b) Interior Point Methods (IPM) can solve large scale linear programming (LP) and second order cone programming (SOCP) problems. However, they are limited in the size of SDPs they can solve. (c) The technique is to decompose an SDP into mixed conic programs with LP, SOCP and smaller SDP blocks. (Conic Dantzig-Wolfe decomposition scheme) -4-

Survey of approaches Dantzig-Wolfe (1960) Decomposition approach for Kelley (1960) large scale LPs. Karmarkar et al. (1980-1990) Interior point methods. ACCPM (Goffin, Vial et al.) Interior point cutting plane LBCPM (Roos et al., methods. Gondzio, Mitchell) Spectral bundle (Helmberg et al.) Lagrangian relaxation Volume algorithm (Barahona et al.) approaches Krishnan-Mitchell (2001) LP decomposition approach for Goldfarb (2002) semidefinite/conic programming. Our present work, Extend this decomposition Oskoorouchi-Goffin (2004) approach to a mixed conic setting (LP, SOCP, smaller SDP blocks). -5-

Semidefinite programming (SDP) min C X s.t. A(X) = b X 0 (SDD) max b T y s.t. A T y + S = C S 0 Notation - X, S, C S n, b R m - A B = trace(ab) = n A ij B ij (Frobenius inner product) i,j=1 - The operator A : S n R m and its adjoint A T : R m S n are A(X) = A 1 X m., A y = y i A i A m X i=1 where A i S n, i = 1,..., m -6- First Prev Next Last Go Back Full Screen Close Quit

Assumptions (1) A 1,..., A m are linearly independent in S n. (2) Slater condition for (SDP) and (SDD): {X S n : A(X) = b, X 0} {(y, S) R m S n : A T y + S = C, S 0} (3) One of the constraints in (SDP) is I X = 1 (trace(x) = 1). (Every SDP with a bounded primal feasible set can be reformulated to satisfy this assumption) -7-

Computational effort in each iteration 1. SDP: ˆ Form the Schur matrix M of size m, whose (i, j)th entry is given by M ij = A j XA i S 1. This requires O(mn 3 + m 2 n 2 ) flops, and is the costliest step in all. ˆ The matrix M is dense despite the sparsity in A i and factoring it requires O(m 3 ) flops. ˆ The line search involves a Cholesky factorization of X and S which is O(n 3 ) flops. 2. LP and SOCP: The costliest step is in factorizing M. ˆ In LP, M inherits the sparsity in A. ˆ In SOCP, if A is sparse, M can be expressed as a sparse matrix plus some low rank updates. -8- First Prev Next Last Go Back Full Screen Close Quit

Conic Dantzig-Wolfe decomposition ˆ Our semidefinite programs are: ˆ Consider the set: ˆ The extreme points of E are: (SDP) min C X s.t. A(X) = b trace(x) = 1 X 0 (SDD) max b T y + z s.t. A T y + zi + S = C S 0 E = {X S n : trace(x) = 1, X 0} {vv T : v R n, v T v = 1} (Infinite number) -9-

Conic Dantzig-Wolfe decomposition: Primal motivation Any X E can be written as: (1) X = j λ j d j d T j where j λ j = 1, λ j 0 X 0 is replaced by λ j 0, j (Semi-infinite LP formulation) (2) X = j P j V j P T j where j trace(v j ) = 1, V j S 2 + X 0 is replaced by V j 0, j (Semi-infinite SOCP formulation) (A 2 2 SDP constraint can be written as a size 3 SOCP constraint) (3) X = j P j V j P T j where j trace(v j ) = 1, V j S r j +, r j 3 X 0 is replaced by V j 0, j (Semi-infinite SDP formulation) -10- First Prev Next Last Go Back Full Screen Close Quit

A 2 2 SDP constraint is an SOCP constraint of size 3 X = ( ) X11 X 12 X 12 X 22 0 X 11 0, X 22 0, X 11 X 22 X 2 12 0 X 11 + X 22 2X 12 X 11 X22 Q 0-10.5- First Prev Next Last Go Back Full Screen Close Quit

Conic Dantzig-Wolfe decomposition: Dual motivation (1) LP formulation: S 0 d T j Sd j 0, j (if we take a finite number of constraints we obtain an LP relaxation) (2) SOCP formulation: S 0 P T j SP j }{{} 2 2 0, j (finite number of constraints gives SOCP relaxation) (3) SDP formulation: S 0 P T j SP j }{{} 0, j (r j 3, j) r j r j (finite number of constraints gives SDP relaxation) -11-

Conic Dantzig-Wolfe decomposition: Master problem 1 The decomposed conic problem over LP, SOCP and SDP blocks is: Primal Dual min s.t. n l i=1 n l n q n s c li x li + c T qjx qj + C sk X sk j=1 k=1 A li x li + A qj x qj + A sk (X sk ) = b i=1 n l n q j=1 n s k=1 x li + x qj1 + trace(x sk ) = 1 i=1 n q j=1 n s k=1 x li 0, i = 1,..., n l x qj Q 0, j = 1,..., n q X sk 0, k = 1,..., n s max b T y + z s.t. A T li y + z + s li = 1 c li, i = 1,..., n l A T qj y + z 0 + s qj 0 = c qj, j = 1,..., n q A T sk y + zi r k + S sk = C sk, k = 1,..., n s s li 0, i = 1,..., n l s qj Q 0, j = 1,..., n q S sk 0, k = 1,..., n s -12- First Prev Next Last Go Back Full Screen Close Quit

Conic Dantzig-Wolfe decomposition: Master problem 2 1. The dual master problem is a relaxation of (SDD). 2. The primal master problem is a constrained version of (SDP), i.e., any feasible solution in the primal master problem is feasible in (SDP). 3. IPMs are well equipped to handle mixed conic problems over LP, SOCP, and SDP cones. 4. We will restrict the size of the SDP blocks to about r = O( n) in the master problem. -13-

Conic Dantzig-Wolfe decomposition: Separation Oracle Input: (y, z ) a feasible point for dual master problem. ˆ If λ min (S ) 0, report feasibility STOP. ˆ Else, solve the following subproblem for X : min (C A T y z I) X s.t. I X = 1 X 0 Factorize X = DMD T with M 0. Return the cut D T (C A T y zi)d 0 with D R n r. If r = 1 is an LP cut. If r = 2 is an SOCP cut. If r 3 is an SDP cut of small size. -14-

Conic Dantzig-Wolfe Algorithm -15-

Choice of the query point: I ˆ In the original Dantzig-Wolfe (Kelley) scheme the query point (y, z) is an optimal solution to the dual master problem. This scheme has a poor rate of convergence. ˆ Better to solve the master problem approximately initially, and gradually tighten this tolerance as the algorithm proceeds. Initially, the master problem is a poor approximation to the SDP problem. Weak tolerance Central dual iterates (y, z) Oracle returns better cuts As the algorithm proceeds, the master problem approximations get tighter. Tight tolerance More emphasis on the objective function Convergence to an optimal solution -16-

Choice of the query point : 2 We adopt the following adaptive strategy for a prescribed tolerance TOL in every iteration: ˆ Solve the master problem to a tolerance TOL to obtain the solution (x, y, z, s ). Compute the following parameters: GAPTOL(x, s ) = x T s max{1, 1 2 (ct x +b T y )} λ min (C A T y z I) max{1, C } INF(y, z ) = OPT(x, y, z, s ) = max{gaptol(x, s ), INF(y, z )} ˆ If OPT(x, y, z, s ) < TOL, we lower TOL by a constant factor. More precisely, TOL = µ OPT(x, y, z, s ) with 0 < µ < 1. Else, TOL remains unchanged. -17-

Algorithm -18-

One iteration of Algorithm (y,z) space y New feasible region (SDD) feasible region A D B z New (y,z) Cutting plane C A Central path Old (y,z) Feasible region of dual master problem A : Solve master problem to tolerance TOL B : Oracle returns a cutting plane C : Restoring dual feasibility D : Warm start -19- First Prev Next Last Go Back Full Screen Close Quit

Upper and lower bounds ˆ The objective value of the primal master problem in every iteration gives an upper bound on the SDP objective value. This is the objective value of the dual master problem plus the duality gap. ˆ Given a solution (y, z ) to the dual master problem in every iteration, a lower bound is computed as follows: Compute λ = λ min (S ), where S = (C A T y z I). Set y lb = y and z lb = z + λ. A lower bound on the SDP objective value is then b t y lb + z lb Note: (y lb, z lb ) is a feasible point in (SDD). ˆ We could also terminate if the difference between these bounds is small. ˆ The computed lower (upper) bounds do not increase (decrease) monotonically. -20-

Warm start after adding a column: 1 ˆ The solution to the old master problem is (x l, x q, y, s l, s q). Consider the linear cut a T l y d in the dual master problem. The new master problem is : min c T l x l + c T q x q + d T β s.t. A l x l + A q x q + a l β = b x l 0, β 0 x q Q 0. max b T y s.t. s l = c l A T l y 0 s q = c q A T q y Q 0 γ = d a T l y 0 ˆ We have γ < 0. We can perturb y (lower bounding) to generate y lb so that γ lb = 0. ˆ The perturbed point (x l, x q, β, y lb, s lb l, s lb q, γ lb ) is feasible in the new master problem with β = γ lb = 0, but NOT STRICTLY!. ˆ We want to increase β and γ lb from their current zero values, while limiting the variation in the other variables. -21-

Warm start after adding a column: 2 ˆ We solve the following problems (WSP) max log β s.t. A l x l + A q x q + a l β = 0 D 1 l β 0. x l 2 + Dq 1 x q 2 1 (WSD) max log γ s.t. a T l y + γ = 0 D l A T l y 2 + D q A T q y 2 1 γ 0. for ( x l, x q, β) and ( y, γ) respectively. Here D l and D q are appropriate primal-dual scaling matrices for LP and SOCP blocks at (x l, x q, s lb l, s lb q ). ˆ Compute ( s l, s q ) = ( A T l y, A T q y). -22- First Prev Next Last Go Back Full Screen Close Quit

Warm start after adding a column: 3 ˆ The solutions to (WSP) and (WSD) are given by x l = Dl 2 A T l (A l Dl 2 A T l + A q DqA 2 T q ) 1 a l β x q = DqA 2 T q (A l Dl 2 A T l + A q DqA 2 T q ) 1 a l β y = (A l Dl 2 A T l + A q DqA 2 T q ) 1 a l β where β R is the solution to max{ 1 2 βt V β + log β : β 0} where V = a T l (A l D 2 l A T l + A q D 2 qa T q ) 1 a l and γ = V β. ˆ Finally set (x st l, x st q, β st ) = (x l + 0.98αmax x p l, x q + 0.98αmax x p q, αmaxβ) p y st = yl lb + 0.98αmax y d + 0.98αmax s d l, s lb q + 0.98αmax s d q, αmaxγ) d (s st l, s st q, γ st ) = (s lb l where α p max and α d max are the maximal primal and dual step lengths respectively. -23- First Prev Next Last Go Back Full Screen Close Quit

Warm start after adding an SOCP cut: 1 ˆ The solution to the old master problem is (x l, x q, y, s l, s q). Consider the SOCP cut a T q y Q d in the dual master problem. The new master problem is : min c T l x l + c T q x q + d T β s.t. A l x l + A q x q + a q β = b x l 0, β Q 0 x q Q 0. max b T y s.t. s l = c l A T l y 0 s q = c q A T q y Q 0 γ = d a T q y Q 0 ˆ We have γ Q 0. We can perturb y (lower bounding) to generate y lb so that γ lb Q 0. ˆ The perturbed point (x l, x q, β, y lb, s lb l, s lb q, γ lb ) is feasible in the new master problem with β = 0 and γ lb Q 0, but NOT STRICTLY!. ˆ We want to increase β and γ lb, making them strictly feasible, while limiting the variation in the other variables. -24-

Warm start after adding an SOCP matrix: 2 ˆ We solve the following problems (WSP) max 1 2 log(β2 1 β2 2 β3) 2 s.t. A l x l + A q x q + a q β = 0 D 1 l x l 2 + Dq 1 x q 2 1 β Q 0. (WSD) max 1 2 log(γ2 1 γ 2 2 γ 2 3) s.t. a T q y + γ = 0 D l A T l y 2 + D q A T q y 2 1 γ Q 0. for ( x l, x q, β) and ( y, γ) respectively. Here D l and D q are appropriate primal-dual scaling matrices for LP and SOCP blocks at (x l, x q, s lb l, s lb q ). ˆ Compute ( s l, s q ) = ( A T l y, A T q y). -25- First Prev Next Last Go Back Full Screen Close Quit

Warm start after adding an SOCP matrix: 3 ˆ Compute x l = Dl 2 A T l (A l Dl 2 A T l + A q DqA 2 T q ) 1 a q β x q = DqA 2 T q (A l Dl 2 A T l + A q DqA 2 T q ) 1 a q β y = (A l Dl 2 A T l + A q DqA 2 T q ) 1 a q β where β R 3 is the solution to max{ 1 2 βt V β + 1 2 log(β2 1 β 2 2 β 2 3) : β Q 0} where V = a T q (A l D 2 l A T l + A q D 2 qa T q ) 1 a q and γ = V β. ˆ Finally set (x st l, x st q, β st ) = (x l + 0.98αmax x p l, x q + 0.98αmax x p q, αmaxβ) p y st = yl lb + 0.98αmax y d + 0.98αmax s d l, s lb q + 0.98αmax s d q, αmaxγ) d (s st l, s st q, γ st ) = (s lb l where α p max and α d max are the maximal primal and dual step lengths respectively. -26- First Prev Next Last Go Back Full Screen Close Quit

Computational Results ˆ Matlab code with the following features: Currently solves SDP over LP and SOCP blocks. SDPT3 is used as conic solver for master problem. Oracle is implemented using MATLAB s Lanczos solver (eigs). Warm-starting and lower-upper bounding procedures built in. ˆ Tested it on Max-Cut problems (SDPLIB, DIMACS set). -27-

Max-Cut problem ˆ Primal min L X s.t. X ii = 1, i = 1,..., n X 0 ˆ Dual max e T y s.t. S = L Diag(y) 0 ˆ n is the number of nodes of the graph. ˆ L is the Laplacian matrix of the graph. ˆ This is an SDP relaxation of the Max-Cut problem. -28-

Computational results for Max-Cut: 1 ˆ Our starting dual master problem is max e T y s.t. y i L ii, i = 1,..., n ˆ We solve the initial problem to a tolerance of 1e 3, and set TOL = INF(y). ˆ Whenever OPT(x, y, s) < TOL, we set TOL = 1 2 OPT(x, y, s). ˆ Our stopping criterion is OPT(x, y, s) < 1e 3 or n iterations, whichever comes earlier. ˆ The oracle returns SOCP cuts if the two smallest eigenvalues of S were negative, and LP cuts if only λ min (S) was negative. -29- First Prev Next Last Go Back Full Screen Close Quit

Computational results for Max-Cut: 2 Problem n Optimal LP SOCP Upper Lower OPT value blocks blocks bound bound (x, y, s) mcp100 100-226.16 102 97 3-225.05-227.11 4e-3 mcp124-1 124-141.99 126 122 3-141.19-142.84 3.5e-3 mcp124-2 124-269.88 124 124 3-268.61-271.11 3.3e-3 mcp250-1 250-317.26 254 246 3-315.64-318.72 2.2e-3 mcp250-2 250-531.93 250 249 3-529.24-534.42 1.6e-3 mcp500-1 500-598.15 500 498 3-596.85-599.87 4.2e-3 mcp500-2 500-1070.06 500 500 3-1068.20-1071.90 6.2e-3 toruspm3-8-50 512-527.80 512 500 3-523.05-531.84 5.2e-3 torusg3-8 512-457.36 512 512 3-449.44-464.13 6.3e-3 1. OPT(x, y, s) = max{gaptol(x, s), INF(y)}. 2. The first 7 problems are from the SDPLIB repository, while the last two problems are from the DIMACS Challenge. -30-

Objective Value 0 100 200 300 400 500 600 Variation of bounds for the toruspm3 8 50 problem Lower Bounds Upper Bounds Best Lower Bound Best Upper Bound Optimal Objective Value 700 800 900 0 50 100 150 200 250 300 350 400 450 500 Iteration Number

Objective Value 100 0 100 200 300 400 500 Variation of bounds for the torusg3 8 problem Lower Bounds Upper Bounds Best Lower Bound Best Upper Bound Optimal Objective Value 600 700 800 0 100 200 300 400 500 600 Iteration Number

Future work ˆ Incorporate smaller SDP blocks in master problem. (Already under way) ˆ Procedure to remove unimportant constraints. (To keep the problem small) ˆ Improve and benchmark our decomposition scheme on general SDPs. ˆ Explore the convergence and complexity issues of decomposition scheme. -32-

Thank you for your attention!. Questions, Comments, Suggestions? The slides from this talk are available online at http://optlab.mcmaster.ca/ kartik/waterloo-math-seminar.pdf -33-

Bibliography [1 ] G. B. Dantzig and P. Wolfe, The Decomposition Algorithm for Linear Programming, Operations Res., 8, 1960, pp. 101-111 [2 ] J. E. Kelley, The Cutting Plane Method for solving Convex Programs, J. Soc. Ind. Appl. Math., 8, 1960, pp. 703-712 [3 ] K. Krishnan and J. Mitchell, Properties of a cutting plane algorithm for semidefinite programming, Technical Report, Dept. of Mathematical Sciences, Rensselaer Polytechnic Institute, May 2003 (submitted to the SIAM J. Optim.) [4 ] M. R. Oskoorouchi and J.-L. Goffin, A matrix generation approach for eigenvalue optimization, College of Business Administration, California State University, San Marcos, October 2004. [5 ] J.-L. Goffin and J.-P. Vial, Multiple cuts in the analytic center cutting plane method, SIAM Journal on Optimization, 11(2000), pp. 266-288 [6 ] B. Borchers, SDPLIB 1.2, A library of semidefinite programming test problems, Optimization Methods and Software, 11, 1999, pp. 683-690. [7 ] R. H. Tutuncu, K. C. Toh, and M. J. Todd, SDPT3 - a Matlab software package for semidefinite-quadratic-linear programming, version 3.0, August 21, 2001 [8 ] DIMACS 7 th challenge: http://dimacs.rutgers.edu/challenges/seventh/instances/ [9 ] K. Krishnan, G. Plaza, and T. Terlaky, A conic interior point Dantzig-Wolfe decomposition approach for large scale semidefinite programming, forthcoming. -34- First Prev Next Last Go Back Full Screen Close Quit