Sparse Matrix Theory and Semidefinite Optimization

Sparse Matrix Theory and Semidefinite Optimization Lieven Vandenberghe Department of Electrical Engineering University of California, Los Angeles Joint work with Martin S. Andersen and Yifan Sun Third VORACE Workshop Toulouse, 20-22 September, 2016

Semidefinite program (SDP) minimize tr(c X) subject to tr(a i X) = b i, i = 1,..., m X 0 variable X is n n symmetric matrix; X 0 means X is positive semidefinite matrix inequalities arise naturally in many areas (for example, control, statistics) used in convex modeling systems (CVX, YALMIP, CVXPY,... ) relaxations of nonconvex quadratic and polynomial optimization in many applications the coefficients A i, C are sparse optimal X is typically dense, even for sparse A i, C 1

SDP with band structure cost of solving SDP with banded matrices (bandwidth 11, 100 constraints) Time per iteration (seconds) 10 3 10 2 10 1 10 0 10 1 O(n 2 ) SDPT3 SeDuMi 10 2 10 2 10 3 for bandwidth 1 (linear program), cost/iteration is linear in n for bandwidth > 1, cost grows as n 2 or faster n (from Andersen, Dahl, Vandenberghe 2010) 2

Power flow optimization an optimization problem with non-convex quadratic constraints Variables complex voltage v i at each node (bus) of the network complex power flow s i j entering the link (line) from node i to node j Non-convex constraints (lower) bounds on voltage magnitudes v min v i v max flow balance equations: bus i s i j g i j s ji bus j s i j + s ji = ḡ i j v i v j 2 g i j is admittance of line from node i to j 3

Semidefinite relaxation of optimal power flow problem introduce matrix variable X = Re (vv H ), i.e., with elements X i j = Re (v i v j ) voltage bounds and flow balance equations are convex in X: v min v i v max vmin 2 X ii vmax 2 s i j + s ji = ḡ i j v i v j 2 s i j + s ji = ḡ i j (X ii + X j j 2X i j ) replace constraint X = Re (vv H ) with weaker constraint X 0 relaxation is exact if optimal X has rank two Sparsity in SDP relaxation: off-diagonal X i j appears in constraints only if there is a line between buses i and j (Jabr 2006, Bai et al. 2008, Lavaei and Low 2012,... ) 4

Outline minimize tr(c X) subject to tr(a i X) = b i, i = 1,..., m X 0 structure in solution X that results from sparsity in coefficients A i, C applications of sparse matrix theory to semidefinite optimization algorithms Outline 1. Chordal graphs 2. Decomposition of sparse matrix cones 3. Multifrontal algorithms for barrier functions 4. Minimum rank positive semidefinite completion

Symmetric sparsity pattern 1 2 5 A = 3 4 A 11 A 21 A 31 0 A 51 A 21 A 22 0 A 42 0 A 31 0 A 33 0 A 53 0 A 42 0 A 44 A 54 A 51 0 A 53 A 54 A 55 sparsity pattern of symmetric n n matrix is set of nonzero positions E {{i, j} i, j {1, 2,..., n}} A has sparsity pattern E if A i j = 0 if i j and {i, j} E notation: A S n E represented by undirected graph (V, E) with edges E, vertices V = {1,..., n} clique (maximal complete subgraph) forms maximal dense principal submatrix 5

Chordal graph undirected graph with vertex set V, edge set E {{v, w} v, w V } G = (V, E) a chord of a cycle is an edge between non-consecutive vertices G is chordal if every cycle of length greater than three has a chord a a f b f b e c e c d not chordal d chordal also known as triangulated, decomposable, rigid circuit graph,... 6

History chordal graphs have been studied in many disciplines since the 1960s combinatorial optimization (a class of perfect graphs) linear algebra (sparse factorization, completion problems) computer science (database theory) machine learning (graphical models, probabilistic networks) nonlinear optimization (partial separability) first used in semidefinite optimization by Fujisawa, Kojima, Nakata (1997) 7

Chordal sparsity and Cholesky factorization Cholesky factorization of positive definite A S n E : PAP T = LDL T P a permutation, L unit lower triangular, D positive diagonal if E is chordal, then there exists a permutation for which P T (L + L T )P S n E A has a zero fill Cholesky factorization if E is not chordal, then for every P there exist positive definite A S n E for which P T (L + L T )P S n E (Rose 1970) 8

Examples Simple patterns Sparsity pattern of a Cholesky factor : edges of non-chordal sparsity pattern : fill entries in Cholesky factorization a chordal extension of non-chordal pattern 9

Perfect elimination ordering ordered graph (V, E, σ): undirected graph, plus numbering σ of vertices a 4 e 2 f b 3 f b a 1 e c 6 c d 5 d higher neighborhood of vertex v: adj + (v) = {w {w, v} E, σ 1 (w) > σ 1 (v)} σ is perfect elimination ordering if all higher neighborhoods are complete perfect elimination ordering exists if and only if (V, E) is chordal (Fulkerson and Gross 1965) 10

Elimination tree 1 2 3 4 5 6 7 8 9 3 4 5 6 9 1 2 7 8 chordal graph with perfect elimination ordering σ = (1, 2,..., n) parent p(v) of vertex v in elimination tree is first vertex in adj + (v) 11

Expanded elimination tree 9 1 2 9 8 3 4 5 6 7 8, 9 7 7, 8 5 9 6 6, 9 4 8 9 5, 7, 8 2 5, 8 3 2, 7 1 each block represents a column: bottom row is v, top row is adj + (v) vertices of tree are complete subgraphs (and include all cliques) for each v, tree vertices that contain v form a subtree of elimination tree 12

Supernodal elimination tree 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 8, 10 5, 6, 7 6, 7 4 13, 14, 15, 16, 17 16, 17 14, 15, 17 10 11, 12 10, 16 8, 9 9, 16 9, 10 3 1, 2 vertices of tree are cliques of sparsity graph top row of each block is intersection of clique with parent clique bottom rows are supernodes; form a partition of {1, 2,..., n} for each v, cliques that contain v form a subtree of elimination tree 13

Outline 1. Chordal graphs 2. Decomposition of sparse matrix cones 3. Multifrontal algorithms for barrier functions 4. Minimum rank positive semidefinite completion

Sparse matrix cones we define two matrix cones in S n E (symmetric n n matrices with pattern E) positive semidefinite matrices S n + S n E = {X Sn E X 0} matrices with a positive semidefinite completion Π E (S n +) = {Π E (X) X 0} Π E is projection on S n E Properties two cones are convex closed, pointed, with nonempty interior (relative to S n E ) form a pair of dual cones (for the trace inner product) 14

Positive semidefinite matrices with chordal sparsity pattern A S n E is positive semidefinite if and only if it can be expressed as A = cliques γ i P T γ i H i P γi with H i 0 (for an index set β, P β is 0-1 matrix of size β n with P β x = x β for all x) = + + A 0 P T γ 1 H 1 P γ1 0 P T γ 2 H 2 P γ2 0 P T γ 3 H 3 P γ3 0 (Griewank and Toint 1984, Agler, Helton, McCullough, Rodman 1988) 15

Decomposition from Cholesky factorization example with two cliques: = H 1 + H 2 H 1 and H 2 follow by combining columns in Cholesky factorization = + readily computed from update matrices in multifrontal Cholesky factorization 16

Non-chordal example (Griewank and Toint 1984) graph with chordless cycle (1, 2,..., k, 1) of length k > 3 the following matrix A S n E is positive semidefinite for λ 2 cos(π/k): A = [ Ã 0 0 0 ], Ã = λ 1 0 0 ( 1) k 1 1 λ 1 0 0 0 1 λ 0 0........ 0 0 0 λ 1 ( 1) k 1 0 0 1 λ for λ < 2 there exists no decomposition Ã = λ u 1 0 ( 1) k 1 0 0 0...... ( 1) k 1 0 u k + u 1 1 0 1 λ u 2 0 0 0 0...... + with positive semidefinite terms 17

Partial separability a function f : R n R is partially separable in variables x i and x j (with i j) if f (x + se i + te j ) = f (x + se i ) + f (x + te j ) f (x) x R n, s, t R f is separable in the variables x i and x j if the other variables are held constant can be represented by an interaction graph with vertices V = {1, 2,..., n}, {i, j} E = f is partially separable in x i and x j Example: f (x) = f 1 (x 1, x 4, x 5 ) + f 2 (x 1, x 3 ) + f 3 (x 2, x 3 ) + f 4 (x 2, x 4 ) 3 1 2 4 5 (Griewank and Toint 1982, 1984) 18

Convex quadratic function interaction graph of f (x) = x T Ax is the sparsity graph of A if G is chordal, with l cliques γ i then there exist PSD matrices H i such that f (x) = x T ( l i=1 P T γ i H i P γi )x = l i=1 x T γ i H i x γi Example 4 2 5, 6 5 3, 4 3, 4 1 f (x) = + T x 1 x 1 x 3 H 1 x 3 x 4 + x 4 [ ] T [ ] x2 x2 H x 3 4 x 4 + T x 3 x 3 x 4 H 2 x 4 x 5 x 5 [ ] T [ ] x5 x5 H x 4 6 x 6 clique tree convex decomposition of f 19

Sum of squares a polynomial f : R n R is a sum of squares of degree 2d or less if f (x) = A α β x α x β = v(x) T Av(x) with A 0 α N n d β N n d x α with α = (α 1,..., α n ) denotes the monomial x α 1 1 xα 2 2 xα n n N n d is the set of nonnegative integer vectors (α 1,..., α n ) with i α i d ( ) n + d v(x) is vector of monomials in x of degree d or less d Sparse sums of squares: exploit partial separability to reduce size of A (Lasserre 2006, Kojima and Muramatsu 2009) 20

Sums of squares with chordal interaction graph f (x) = v(x) T Av(x) define a graph G = (V, E) with V = {1,..., n} and {i, j} E = f contains no monomials that depend on x i and x j (A α β = 0 if α i + β i > 0 and α j + β j > 0) sparsity pattern of A is chordal if G is chordal Example (n = 6, d = 2): clique tree of G and clique tree of sparsity graph of A 5, 6 5 3, 4 1, x 5, x 6, x 2 5, x 5x 6, x 2 6 1, x 5, x 2 5 x 3, x 4, x 2 3, x 3x 4, x 3 x 5, x 2 4, x 4x 5, x 2 5 4 2 3, 4 1 1, x 4, x 2 4 x 2, x 2 2, x 2x 4 1, x 3, x 4, x2 3, x 3x 4, x 2 4 x 1, x 2 1, x 1x 3, x 1 x 4 21

Clique decomposition clique decomposition gives more compact sum-of-squares representation v(x γi ) is the l f (x) = v(x γi ) T H i v(x γi ), H i 0 for i = 1,..., l i=1 ( γi + d d ) -vector with monomials in x γi of degree d or less Example (n = 6, d = 2): clique tree of G and clique tree of sparsity graph of A 4 2 5, 6 5 3, 4 3, 4 1 f (x) = v(x 1, x 3, x 4 ) T H 1 v(x 1, x 3, x 4 ) + v(x 3, x 4, x 5 ) T H 2 v(x 3, x 4, x 5 ) + v(x 2, x 4 ) T H 3 v(x 2, x 4 ) + v(x 5, x 6 ) T H 4 + v(x 5, x 6 ) clique tree sparse sum of squares decomposition of f 22

PSD-completable cone with chordal sparsity A S n E has a positive semidefinite completion if and only if A γi γ i 0 for all cliques γ i follows from duality and clique decomposition of positive semidefinite cone Example (three cliques γ 1, γ 2, γ 3 ) A γ1 γ 1 0 PSD completable A A γ2 γ 2 0 A γ3 γ 3 0 (Grone, Johnson, Sá, Wolkowicz, 1984) 23

Sparse semidefinite optimization minimize tr(c X) subject to tr(a i X) = b i, i = 1,..., m X 0 aggregate sparsity pattern E is union of patterns of C, A 1,..., A m without loss of generality, can assume E is chordal constraint X 0 can be replaced by X γi γ i 0 for all cliques γ i Decomposition algorithms replace X with smaller matrix variables X i = X γi γ i add equality constraints to enforce consistency of the variables X i first proposed for interior-point methods (Fukuda et al. 2000, Nakata et al. 2003) first-order (dual decomposition) methods (Lu et al. 2007, Lam et al. 2011,... ) 24

Example: sparse nearest matrix problems find nearest sparse PSD-completable matrix with given sparsity pattern minimize X A F 2 subject to X Π E (S+ n ) find nearest sparse PSD matrix with given sparsity pattern minimize subject to S + A 2 F S S n + Sn E these two problems are duals: K = Π E (S n + ) K = (S n + Sn E ) A 25

Decomposition methods from the decomposition theorems, the two problems can be written primal: minimize X A 2 F subject to X γi γ i 0, i = 1,..., l dual: minimize A + l i=1 P T γ i H i P γi 2 F subject to H i 0, i = 1,..., l Algorithms Dykstra s algorithm (dual block coordinate ascent) (fast) dual projected gradient algorithm (FISTA) Douglas-Rachford splitting, ADMM sequence of projections on PSD cones of order γ i (eigenvalue decomposition) 26

Results sparsity patterns from University of Florida Sparse Matrix Collection n density #cliques avg. clique size max. clique 20141 2.80e-3 1098 35.7 168 38434 1.25e-3 2365 28.1 188 57975 9.04e-4 8875 14.9 132 79841 9.71e-4 4247 44.4 337 114599 2.02e-4 7035 18.9 58 total runtime (sec) time/iteration (sec) n FISTA Dykstra DR FISTA Dykstra DR 20141 2.5e2 3.9e1 3.8e1 1.0 1.6 1.5 38434 4.7e2 4.7e1 6.2e1 2.1 1.9 2.5 57975 > 4hr 1.4e2 1.1e3 3.5 5.7 6.4 79841 2.4e3 3.0e2 2.4e2 6.3 7.6 9.7 114599 5.3e2 5.5e1 1.0e2 2.6 2.2 4.0 (Sun and Vandenberghe 2015) 27

Outline 1. Chordal graphs 2. Decomposition of sparse matrix cones 3. Multifrontal algorithms for barrier functions 4. Minimum rank positive semidefinite completion

Sparse SDP as nonsymmetric conic linear program minimize tr(c X) subject to tr(a i X) = b i, i = 1,..., m X 0 aggregate sparsity pattern E is union of patterns of C, A 1,..., A m Equivalent conic LP minimize tr(c X) subject to tr(a i X) = b i, i = 1,..., m X K K Π E (S+ n ) is cone of PSD completable matrices with pattern E optimization problem in a lower-dimensional space K is not self-dual; no symmetric interior-point methods 28

Barrier function for positive semidefinite cone φ(s) = log det S, dom φ = {S S n E S 0} gradient (negative projected inverse) φ(s) = Π E (S 1 ) requires entries of dense inverse S 1 on diagonal and for {i, j} E Hessian applied to sparse Y S n E : 2 φ(s)[y] = d dt φ(s + ty ) t=0 = Π E ( S 1 Y S 1) requires projection of dense product S 1 Y S 1 29

Multifrontal algorithms assume E is a chordal sparsity pattern (or chordal extension) Cholesky factorization (Duff and Reid 1983) factorization S = LDL T gives function value of barrier: φ(s) = i log D ii recursion on elimination tree in topological order Gradient (Campbell and Davis 1995, Andersen et al. 2013) compute φ(s) = Π E (S 1 ) from equation S 1 L = L T D 1 recursion on elimination tree in inverse topological order Hessian compute 2 φ(s)[y] = Π E (S 1 Y S 1 ) by linearizing recursion for gradient two recursions on elimination tree (topological and inverse topological order) 30

Projected inverse versus Cholesky factorization 10 6 Problem statistics 10 1 Time comparison Number of nonzeros in A 10 5 10 4 10 3 10 2 10 1 Projected inverse 10 0 10 1 10 2 10 3 10 4 10 0 10 0 10 1 10 2 10 3 10 4 10 5 10 6 10 5 10 5 10 4 10 3 10 2 10 1 10 0 10 1 Order n Cholesky factorization 667 patterns from University of Florida Sparse Matrix Collection time in seconds for projected inverse and Cholesky factorization code at github.com/cvxopt/chompack 31

Barrier for positive semidefinite completable cone φ (X) = sup S ( tr(x S) φ(s)), dom φ = {X = Π E (Y ) Y 0} this is the conjugate of the barrier φ(s) = log det S for the sparse PSD cone inverse Z = Ŝ 1 of optimal Ŝ is maximum determinant PD completion of X: maximize subject to log det Z Π E (Z) = X gradient and Hessian of φ at X are φ (X) = Ŝ, 2 φ (X) = 2 φ(ŝ) 1 for chordal E, efficient multifrontal algorithms for Cholesky factors of Ŝ, given X 32

Inverse completion versus Cholesky factorization 10 1 Inverse completion factorization 10 0 10 1 10 2 10 3 10 4 10 5 10 5 10 4 10 3 10 2 10 1 10 0 10 1 Cholesky factorization time for Cholesky factorization of inverse of maximum determinant PD completion 33

Nonsymmetric interior-point methods minimize tr(c X) subject to tr(a i X) = b i, i = 1,..., m X Π E (S n + ) can be solved by nonsymmetric primal or dual barrier methods logarithmic barriers for cone Π E (S n + ) and its dual cone Sn + Sn E : φ (X) = sup S ( tr(x S) + log det S ), φ(s) = log det S fast evaluation of barrier values and derivatives if pattern is chordal examples: linear complexity per iteration for band or arrow pattern code and numerical results at github.com/cvxopt/smcp (Fukuda et al. 2000, Burer 2003, Srijungtongsiri and Vavasis 2004, Andersen et al. 2010) 34

Sparsity patterns sparsity patterns from University of Florida Sparse Matrix Collection m = 200 constraints random data with 0.05% nonzeros in Ai relative to E 500 1000 500 500 1000 2000 1500 1000 1000 2000 3000 2500 4000 1500 1500 500 1000 1500 2000 500 1000 1500 2000 3000 500 1000 1500 2000 2500 3000 1000 2000 3000 4000 rs228 rs35 rs200 rs365 n = 1,919 n = 2,003 n = 3,025 n = 4,704 1000 2000 2000 2000 5000 4000 4000 3000 4000 5000 8000 10000 7000 1000 2000 3000 4000 5000 6000 7000 15000 8000 6000 6000 10000 6000 10000 20000 12000 25000 14000 2000 4000 6000 8000 10000 2000 4000 6000 8000 100001200014000 30000 5000 10000 15000 20000 25000 30000 rs1555 rs828 rs1184 rs1288 n = 7,479 n = 10,800 n = 14,822 n = 30,401 35

Results n DSDP SDPA SDPA-C SDPT3 SeDuMi SMCP 1919 1.4 30.7 5.7 10.7 511.2 2.3 2003 4.0 34.4 41.5 13.0 521.1 15.3 3025 2.9 128.3 6.0 33.0 1856.9 2.2 4704 15.2 407.0 58.8 99.6 4347.0 18.6 n DSDP SDPA-C SMCP 7479 22.1 23.1 9.5 10800 482.1 1812.8 311.2 14822 791.0 2925.4 463.8 30401 mem 2070.2 320.4 average time per iteration for different solvers SMCP uses nonsymmetric matrix cone approach (Andersen et al. 2010) 36

Outline 1. Chordal graphs 2. Decomposition of sparse matrix cones 3. Multifrontal algorithms for logarithmic barriers 4. Minimum rank positive semidefinite completion

Minimum rank PSD completion with chordal sparsity recall that A S n E has a positive semidefinite completion if and only if A γi γ i 0 for all cliques γ i A γ1 γ 1 0 PSD completable A A γ2 γ 2 0 A γ3 γ 3 0 the minimum rank PSD completion has rank equal to max rank(a γi γ i ) cliques γ i (Dancis 1992) 37

Two-block completion problem we first consider the simple two-block completion problem?? A = A 11 A 12 0 A 21 A 22 A 23 0 A 32 A 33 a completion exists if and only if C 1 = [ ] A11 A 12 A 21 A 22 0, C 2 = [ ] A22 A 23 A 32 A 33 0 we construct a PSD completion of rank r = max{rank(c 1 ), rank(c 2 )} 38

Two-block completion algorithm compute matrices U, V, Ṽ, W of column dimension r such that [ ] A11 A 12 A 21 A 22 = [ U V ] [ U V ] T, [ ] A22 A 23 A 32 A 33 = [ Ṽ W ] [ Ṽ W ] T since VV T = ṼṼ T, the matrices V and Ṽ have SVDs V = PΣQ T 1, Ṽ = PΣQT 2 hence V = ṼQ with Q = Q 2 Q T 1 an orthogonal r r matrix a completion of rank r is given by UQ T Ṽ W UQ T Ṽ W T = A 11 A 12 UQ T W T A 21 A 22 A 23 WQU T A 32 A 33 39

Minimum rank completion for general chordal pattern recursion over supernodal elimination tree in inverse topological order 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 8, 10 5, 6, 7 6, 7 4 13, 14, 15, 16, 17 16, 17 14, 15, 17 10 11, 12 10, 16 8, 9 9, 16 9, 10 3 1, 2 vertices of tree are cliques γ i top row of vertex i is intersection α i of clique γ i with parent clique bottom rows are supernodes ν i = γ i \ α i ; these form a partition of {1, 2,..., n} 40

Minimum rank completion for general chordal pattern we compute an n r matrix Y with Π E (YY T ) = A and r = max cliques γ i rank(a γi γ i ) the block rows Y νj are computed in inverse topological order hence Y α j is known when we compute Y νj Algorithm: enumerate cliques in inverse topological order at vertex γ j, compute matrices U j, V j with column dimension r such that [ ] Aνj ν j A νj α j A α j ν j A α j α j = [ Uj V j ] [ Uj V j ] T if j is the root of the supernodal elimination tree, set Y νj = U j otherwise, compute orthogonal Q such that V j = Y α j Q and set Y νj = U j Q T 41

Sparse semidefinite optimization minimize tr(c X) subject to tr(a i X) = b i, i = 1,..., m X 0 aggregate sparsity pattern E is union of patterns of C, A 1,..., A m any feasible X can be replaced by PSD completion of Π E (X) if E is chordal, there is a completion with rank bounded by size of largest clique proves exactness of some simple SDP relaxations useful for rounding solution of SDP relaxations to minimum rank solution 42

Summary Sparse matrix theory: PSD and PSD-completable matrices with chordal pattern decomposition of matrix cones as sum or intersection of simple cones fast algorithms for evaluating barrier functions and derivatives simple algorithms for maximum determinant and minimum rank completion simple algorithms for Euclidean distance matrix completion Sparse semidefinite optimization minimize tr(c X) subject to tr(a i X) = b i, i = 1,..., m X 0 decomposition and splitting methods nonsymmetric interior-point methods 43