III. Applications in convex optimization

III. Applications in convex optimization nonsymmetric interior-point methods partial separability and decomposition partial separability first order methods interior-point methods

Conic linear optimization primal: minimize c T x subject to Ax = b x C dual: maximize b T y subject to A T y + s = c s C C is a proper cone (convex, closed, pointed, with nonempty interior) C = {z z T x 0 for all x C} is the dual cone widely used in recent literature on convex optimization Interior-point methods a convenient format for extending interior-point methods from linear optimization to general convex optimization Modeling a small number of primitive cones is sufficient to model most convex constraints encountered in practice Applications in convex optimization 126

Symmetric cones most current solvers and modeling systems use three types of cones nonnegative orthant second-order cone positive semidefinite cone these cones are not only self-dual but symmetric (self-scaled) symmetry is exploited in primal-dual symmetric interior-point methods large gaps in (linear algebra) complexity between the three cones (see the examples on page 5 6) Applications in convex optimization 127

Sparse semidefinite optimization problem Primal problem minimize tr(cx) subject to tr(a i X) = b i, i = 1,..., m X 0 Dual problem maximize subject to b T y m y i A i + S = C i=1 S 0 Aggregate sparsity pattern the union of the patterns of C, A 1,..., A m feasible X is usually dense, even for problems with aggregate sparsity feasible S is sparse with sparsity pattern E Applications in convex optimization 128

Equivalent nonsymmetric conic LPs Primal problem minimize tr(cx) subject to tr(a i X) = b i, i = 1,..., m X C Dual problem maximize subject to b T y m y i A i + S = C i=1 S C variables X and S are sparse matrices in S n E C = Π E (S n +) is cone of PSD completable matrices with sparsity pattern E C = S n + S n E is cone of PSD matrices with sparsity pattern E C is not self-dual; no symmetric interior-point methods Applications in convex optimization 129

Nonsymmetric interior-point methods minimize tr(cx) subject to tr(a i X) = b i, i = 1,..., m X Π E (S n +) can be solved by nonsymmetric primal or dual barrier methods logarithmic barriers for cone Π E (S n +) and its dual cone S n + S n E : φ (X) = sup S ( tr(xs) + log det S), φ(s) = log det S fast evaluation of barrier values and derivatives if pattern is chordal (Fukuda et al. 2000, Burer 2003, Srijungtongsiri and Vavasis 2004, Andersen et al. 2010) Applications in convex optimization 130

Primal path-following method Central path: solution X(µ), y(µ), S(µ) of tr(a i X) = b i, i = 1,..., m m y j A j + S = C j=1 µ φ (X) + S = 0 Search direction at iterate X, y, S: solve linearized central path equations tr(a i X) = r i, i = 1,..., m m y i A i + S = C i=1 µ 2 φ (X)[ X] + S = µ φ (X) S Applications in convex optimization 131

Dual path-following method Central path: an equivalent set of equations is tr(a i X) = b i, i = 1,..., m m y j A j + S = C j=1 X + µ φ(s) = 0 Search direction at iterate X, y, S: solve linearized central path equations tr(a i X) = r i, i = 1,..., m m y i A i + S = C i=1 X + µ 2 φ(s)[ S] = µ φ(s) X Applications in convex optimization 132

Computing search directions eliminating X, S from linearized equation gives H y = g in a primal method H ij is the inner product of A i and 2 φ (X)[A j ]: H ij = tr(a i 2 φ (X)[A j ]) in a dual method H ij is the inner product of A i and 2 φ(s)[a j ]: H ij = tr(a i 2 φ(s)[a j ]) the algorithms from lecture 2 can be used to evaluate gradient and Hessians the system H y = g is solved via dense Cholesky or QR factorization Applications in convex optimization 133

Sparsity patterns sparsity patterns from University of Florida Sparse Matrix Collection m = 200 constraints random data with 0.05% nonzeros in Ai relative to E 500 1000 500 500 1000 2000 1500 1000 1000 2000 3000 2500 4000 1500 1500 500 1000 1500 2000 500 1000 1500 2000 3000 500 1000 1500 2000 2500 3000 1000 2000 3000 4000 rs228 rs35 rs200 rs365 n = 1,919 n = 2,003 n = 3,025 n = 4,704 1000 2000 2000 2000 5000 4000 4000 3000 4000 5000 8000 10000 7000 1000 2000 3000 4000 5000 6000 7000 15000 8000 6000 6000 10000 6000 10000 20000 12000 25000 14000 2000 4000 6000 8000 10000 2000 4000 6000 8000 100001200014000 30000 5000 10000 15000 20000 25000 30000 rs1555 rs828 rs1184 rs1288 n = 7,479 n = 10,800 n = 14,822 n = 30,401 Applications in convex optimization 134

Results n DSDP SDPA SDPA-C SDPT3 SeDuMi SMCP 1919 1.4 30.7 5.7 10.7 511.2 2.3 2003 4.0 34.4 41.5 13.0 521.1 15.3 3025 2.9 128.3 6.0 33.0 1856.9 2.2 4704 15.2 407.0 58.8 99.6 4347.0 18.6 n DSDP SDPA-C SMCP 7479 22.1 23.1 9.5 10800 482.1 1812.8 311.2 14822 791.0 2925.4 463.8 30401 mem 2070.2 320.4 average time per iteration for different solvers SMCP uses nonsymmetric matrix cone approach (Andersen et al. 2010) code and more benchmarks at github.com/cvxopt/smcp Applications in convex optimization 135

Band pattern SDPs of order n with bandwidth 11 and m = 100 equality constraints Time per iteration 10 4 10 3 10 2 10 1 10 0 M1 M2 CSDP DSDP SDPA SDPA-C SDPT3 SeDuMi 10-1 10-2 10 2 10 3 10 4 n nonsymmetric solver SMCP (two variants M1, M2): complexity is linear in n (Andersen et al. 2010) Applications in convex optimization 136

Arrow pattern matrix norm minimization of page 6 matrices of size p q with q = 10 with m = 100 variables Time per iteration 10 3 10 2 10 1 10 0 M1 M2 CSDP DSDP SDPA SDPA-C SDPT3 SeDuMi 10-1 10-2 10 2 10 3 10 4 p +q nonsymmetric solver SMCP (M1, M2): complexity linear in p Applications in convex optimization 137

III. Applications in convex optimization nonsymmetric interior-point methods partial separability and decomposition partial separability first order methods interior-point methods

Partial separability Partially separable function (Griewank and Toint 1982) f(x) = l f k (P βk x) k=1 x is an n-vector; β 1,..., β l are (small) overlapping index sets in {1, 2,..., n} Example: f(x) = f 1 (x 1, x 4, x 5 ) + f 2 (x 1, x 3 ) + f 3 (x 2, x 3 ) + f 4 (x 2, x 4 ) Partially separable set C = {x R n x βk C k, k = 1,..., l} the indicator function is a partially separable function Applications in convex optimization 138

Interaction graph vertices V = {1, 2,..., n}, {i, j} E i, j β k for some k if {i, j} E, then f is separable in x i and x j if other variables are fixed: f(x + se i + te j ) = f(x + se i ) + f(x + te j ) f(x) x R n, s, t R Example: f(x) = f 1 (x 1, x 4, x 5 ) + f 2 (x 1, x 3 ) + f 3 (x 2, x 3 ) + f 4 (x 2, x 4 ) 3 1 2 4 5 Applications in convex optimization 139

Example: PSD completable cone with chordal pattern for chordal E, the cone Π E (S n +) is partially separable (see page 104) Π E (S n +) = {X S n E X γi γ i 0 for all cliques γ i } the interaction graph is chordal Example: chordal sparsity pattern, clique tree, clique tree of interaction graph 1 2 3 4 5, 6 5 3, 4 (5, 5), (6, 5), (6, 6) (5, 5) (3, 3), (4, 3), (5, 3), (4, 4), (5, 4) 5 6 4 2 3, 4 1 (4, 4) (2, 2), (4, 2) (3, 3), (4, 3), (4, 4) (1, 1), (3, 1), (4, 1) Applications in convex optimization 140

Partially separable convex optimization minimize f(x) = l f k (P βk x) k=1 Equivalent problem minimize subject to l k=1 f k ( x k ) x = P x we introduced splitting variables x k to make cost function separable P, x are stacked matrix and vector P = P β 1. P βl, x = x 1. x l, P T P is diagonal ((P T P ) ii is the number of sets β k that contain index i) Applications in convex optimization 141

Decomposition via first-order methods Reformulated problem and its its dual (f k is conjugate function of f k) minimize l k=1 f k ( x k ) subject to x range(p ) maximize l k=1 f k ( s k) subject to s nullspace(p T ) cost functions are separable diagonal property of P T P makes projections on range inexpensive Algorithms: many algorithms can exploit these properties, for example Douglas-Rachford (DR) splitting of the primal alternating direction method of multipliers (ADMM) Applications in convex optimization 142

Example: sparse nearest matrix problems find nearest sparse PSD-completable matrix with given sparsity pattern minimize X A 2 F subject to X Π E (S n +) find nearest sparse PSD matrix with given sparsity pattern minimize subject to S + A 2 F S S n + S n E these two problems are duals: K = Π E (S n +) K = (S n + S n E ) A Applications in convex optimization 143

Decomposition methods from the decomposition theorems (pages 82 and 104), the problems can be written primal: minimize X A 2 F subject to X γi γ i 0 for all cliques γ i dual: minimize A + P T i V c γ i H i P γi 2 F subject to H i 0 for all i V c Algorithms Dykstra s algorithm (dual block coordinate ascent) (fast) dual projected gradient algorithm (FISTA) Douglas-Rachford splitting, ADMM sequence of projections on PSD cones of order γ i (eigenvalue decomposition) Applications in convex optimization 144

Results matrices from University of Florida sparse matrix collection n density #cliques avg. clique size max. clique 20141 2.80e-3 1098 35.7 168 38434 1.25e-3 2365 28.1 188 57975 9.04e-4 8875 14.9 132 79841 9.71e-4 4247 44.4 337 114599 2.02e-4 7035 18.9 58 total runtime (sec) time/iteration (sec) n FISTA Dykstra DR FISTA Dykstra DR 20141 2.5e2 3.9e1 3.8e1 1.0 1.6 1.5 38434 4.7e2 4.7e1 6.2e1 2.1 1.9 2.5 57975 > 4hr 1.4e2 1.1e3 3.5 5.7 6.4 79841 2.4e3 3.0e2 2.4e2 6.3 7.6 9.7 114599 5.3e2 5.5e1 1.0e2 2.6 2.2 4.0 (Sun and Vandenberghe 2015) Applications in convex optimization 145

Conic optimization with partially separable cones minimize subject to c T x Ax = b x C assume C is partially separable: C = {x R n P βk x C k, k = 1,..., l} most important application is sparse semidefinite programming (C is vectorized PSD completable cone) bottleneck in interior-point methods is Schur complement equation AH 1 A T y = r (in a primal barrier method, H is the Hessian of the barrier for C) coefficient of Schur complement equation is often dense, even for sparse A Applications in convex optimization 146

Reformulation minimize subject to c T x Ax = b P βk x C k, k = 1,..., l introduce l splitting variables x k = P γk x and add consistency constraints x range(p ) where x = x 1. x l choose c, Ã such that ÃP = A and c T P = c T, P = P 1. P l Converted problem minimize subject to c T x Ã x = b x C 1 C l x range(p ) Applications in convex optimization 147

Chordal structure in interaction graph suppose the interaction graph is chordal, and the sets β k are cliques the cliques β k that contain a given index j form a subtree of the clique tree therefore the consistency constraint x range(p ) is equivalent to P αj (P T β k x k P T β j x j ) = 0 for each vertex j and its parent k in a clique tree E αk (E T β k x k E T β i x i )=0 α i β i \ α i x i C i P αj (P T β j x j E T β k x k )=0 α k β k \ α k x k C k α j β j \ α j x j C j α i is the intersection of β i and its parent Applications in convex optimization 148

Schur complement system of converted problem minimize subject to c T x Ã x = b x C 1 C l B x = 0 (consistency eqs.) Schur complement equation in interior-point method [ ÃH 1 Ã T ÃH 1 B T BH 1 Ã T BH 1 B T ] [ y u ] = [ r1 r 2 ] H is block-diagonal (in primal barrier method, the Hessian of C 1 C k ) larger than Schur complement system before conversion however 1,1 block is often sparse for semidefinite optimization, this is known as the clique-tree conversion method (Fukuda et al. 2000, Kim et al. 2011) Applications in convex optimization 149

Example 1 2 3 4 5, 6 5 3, 4 (5, 5), (6, 5), (6, 6) (5, 5) (3, 3), (4, 3), (5, 3), (4, 4), (5, 4) 5 6 4 2 3, 4 1 (4, 4) (2, 2), (4, 2) (3, 3), (4, 3), (4, 4) (1, 1), (3, 1), (4, 1) a 6 6 matrix X with this pattern is positive semidefinite if and only if the matrices X γ1 γ 1 = X γ3 γ 3 = X 11 X 13 X 14 X 31 X 33 X 34 X 41 X 43 X 44 are positive semidefinite X 33 X 34 X 35 X 43 X 44 X 45 X 53 X 54 X 55, X γ2 γ 2 =, X γ4 γ 4 = [ ] X22 X 24, X 42 X 44 [ ] X55 X 56 X 65 X 66 Applications in convex optimization 150

Example 1 2 3 4 5, 6 5 3, 4 (5, 5), (6, 5), (6, 6) (5, 5) (3, 3), (4, 3), (5, 3), (4, 4), (5, 4) 5 6 4 2 3, 4 1 (4, 4) (2, 2), (4, 2) (3, 3), (4, 3), (4, 4) (1, 1), (3, 1), (4, 1) define a splitting variable for each of the four submatrices X 1 S 4, X2 S 2, X3 S 4, X4 S 2 add consistency constraints [ ] X1,22 X1,23 X 1,32 X1,33 = [ ] X3,11 X3,12, X2,22 = X X 3,22, X3,33 = X 4,11 3,21 X3,22 Applications in convex optimization 151

Summary: u and sparse semidefinite optimization sparse SDPs with chordal sparsity are partially separable minimize tr(cx) subject to tr(a i X) = b i, i = 1,..., m X γk γ k 0 k = 1,..., l introducing splitting variables one can reformulate this as minimize subject to l k=1 l k=1 tr( C k Xk ) tr(ãik X k ) = b i, X k 0, k = 1,..., l consistency constraints i = 1,..., m this was first proposed as a technique for speeding up interior-point methods also useful in combination with first-order splitting methods (Lu et al. 2007, Lam et al. 2011, Dall Anese et al. 2013, Sun et al. 2014,... ) useful for distributed algorithms (Pakazad et al. 2014) Applications in convex optimization 152