V C V L T I 0 C V B 1 V T 0 I. l nk

Multifrontal Method Kailai Xu September 16, 2017 Main observation. Consider the LDL T decomposition of a SPD matrix [ ] [ ] [ ] [ ] B V T L 0 I 0 L T L A = = 1 V T V C V L T I 0 C V B 1 V T, 0 I where B = LL T is the Cholesky factorization of B. The Schur complement update V B 1 V T can be written as j 1 V B 1 V T = (V L T )(L 1 V T ) = k=1 l j,k l j+1,k. l nk [ ] lj,k l j+1,k... l nk. Here j : n represents the index of V L T in the global matrix. Every term in the summation can be seen as the contribution from the k-th column. Frontals Let T [j] denotes the descendants of node j and itself, i.e. nonzero entries in the lower off-diagonal in the symbolic L component of A = LL T and itself. Then we have Ū j = k T [j] {j} l j,k l i1,k. l ir,k [ ] lj,k l i1,k... l ir,k, where j, i 1, i 2,..., i r represents all the possible nonzero patterns for node j and above. 1

Figure 1: Elimination tree. For example, in Figure, if j = 6, we have T [j] = {2, 3, 4, 5, 6}, i 1 = 8, i 2 = 9. The Schur complement, which we call frontal matrix, is a j,j a j,i1... a j,ir a i1,j F j =. 0 + Ūj It is easy to see Ū j = k<j,l jk 0 l j,k l i1,k. l ir,k a ir,j [ lj,k l i1,k... l ir,k] k T [j] {j},l jk =0 0 l i1,k [ 0 li1,k... l ir,k]. It is easy to see that only the first term contributes to the first row/column and therefore, l j,j 0 1 0 l j,j l i1,j... l ir,j l i1,j F j =. I, 0 U j 0 I l ir,j l ir,k where U j = k T [j] l i1,k. l ir,k [ ] l i1,k... l ir,k 2

Frontal Method We use to define an addition by indexes. If R = [ p q u v ] [ ] w x, S = y z where R corresponds to {5, 8} and S corresponds to {5, 9} then we have p + w q x R S = u v 0. y 0 z We can see if c 1, c 2,..., c s are the children of node j in the elimination tree then a jj a ji1... a jir a i1,j F j =. 0 U c 1 U c2... U cs a ir,j We have the following algorithm for sparse Cholesky factorization 3

Algorithm 1 Sparse Cholesky Factorization 1: procedure sparsechol 2: for j = 1, 2,..., n do 3: Let j, i 1,..., i r be the locations of nonzeros in L j. 4: Let c 1,..., c s be the children of j in the elimination tree. 5: Form the matrix Ū = U c 1 U c2... U cs and frontal matrix a jj a ji1... a jir a i1,j F j =. 0 Ū a ir,j 6: Factorize F j into l j,j 0 l i1,j. I l ir,j 1 0 l j,j l i1,j... l ir,j 0 U j 0 I 7: function FindL(A) 8: L tril(a) 0 9: for i = 1, 2,..., n do 10: for j nonzero(l :,i )\{i} do 11: for k nonzero(l :,i )\{i}, k > j do 12: L kj true 13: return L It is without any difficulty that the method can be generalized to any matrix 4

Algorithm 2 Sparse Cholesky Factorization 1: procedure sparsechol 2: for j = 1, 2,..., n do 3: Let j, i 1,..., i r be the locations of nonzeros in L j. 4: Let c 1,..., c s be the children of j in the elimination tree. 5: Form the matrix Ū = U c 1 U c2... U cs and frontal matrix a jj a ji1... a jir a i1,j f j =. 0 Ū a ir,j 6: Factorize F j into 1 0 l i1,j. I l ir,j u j,j u i1,j... u ir,j 0 U j Left-looking, right-looking and multifrontal Recall in LU factorization, after scaling one column, we may apply the addition to the other columns immediately or delay it until we need to use the column. This is called left-looking and right-looking respectively. Multifrontal method is another style of looking, which can be illustrated below. Figure 2: Left-looking, right-looking and multifrontal. General Assembly Tree. We do not have to use elimination tree for the factorization process. Actually, we only need an assembly tree, i.e. for any node j with parent node p, 5

the off-diagonal structure of L j is a subset of the structure of L p and p > j. For example, by defining the parent p of j p = min{i > j off-diagonal structure of L j structure of L i }, we will obtain a valid assembly tree(figure ). Figure 3: Assembly tree. Pivoting and scaling It is important to do pivoting and scaling for numerical accuracy reasons. As indicated in[1], the relative forward error is bounded by the condition number of the linear system, multiplied by the backward-error. an ill-posed problem: the backward error can be small even if the solution is far from the exact solution; in that case, the condition number of the system is large; an unstable algorithm, leading to a large backward error compared to the machine precision, even when the condition number of the linear system is small. They can be measured by Backward error err = min{ε > 0 such that A ε A, b ε b }, (A + A) x = b + b} = A x b A x + b 6

condition number A 1 A x + A 1 b x The mathematical formula for pivoting and scaling is(note scaling matrix and permutation matrix are communicable) P D r AD c QDc 1 x = P D r b Doing scaling on a row i is equivalent to doing the corresponding scaling on all the l ij components for j T [i] {i}, and scaling on a ij on all the j Adj G (i). Doing scaling on a column i is equivalent to doing the corresponding scaling on all the u ji components for j T [i] {i}, and scaling on a ji on all the j Adj G (i). As for pivoting, we have limited choice as pivoting may deconstruct the elimination trees. The rule here is to only pivot inside the blocks that are already summed. This requires us to add the contribution of child blocks to parents immediately after factorization instead of the left-looking way. Figure 4: Pivoting on a tree. Node 7 and node 8 have already been summed as the contributions from 1 and 6 have been added. Now if we want to factorize 7, we can do the pivot 8 7. This is the same as interchange node 7 and 8. Supernodal To take advantage of higher BLAS level operations, the use of supernodes is preferred. Supernode is defined as[2] 7

A supernode is a maximal set of contiguous nodes such that Adj G (T [j]) = {j, j + 1,..., j + t} Adj G (T [j + t]) It follows from this definition that in a supernode {j, j + 1,..., j + t}, for 1 k t the node j + k is the parent of j + k 1 in the elimination tree. Here Adj G (T [j]) gives the subscript set of nonzeros in column L j. A supernode corresponds to a maximal block of contiguous columns in the Cholesky factor, where the corresponding diagonal block is full triangular, and these columns all have identical off-block-diagonal column structure. Solve Procedure Given a right hand side, we have the following solving algorithm Algorithm 3 Forward Elimination 1: function Forward_Elimination(b) 2: for i = 1, 2,..., n do Working on front i 3: Solve l ii y i = b i 4: for k > i, l ki 0 do 5: b k b k l ki y i Algorithm 4 Backward Substitution. It is important to use right-looking method to do backward substitution. In every step, we first add the contributions from the successors, i.e. Adj G ( ). 1: function Backward_Substituion(y) 2: for i = n, n 1,..., 1 do 3: for k > i, u ik 0 do For SPD, u ik = l ki 4: y i y i u ik x k 5: Solve u ii x i = y i Programming considerations On the programming aspect, the data should not live in the graph itself. Instead, it should be stored in a separate data structure. For example, in the example above, the data can be stored in a map which takes (self-)edges as keys. The following result illustrates the application of multifrontal method to random matrices without pivoting. 8

10 3 10 2 Factorization Solve 10 1 time(s) 10 0 10-1 10-2 10 1 10 2 10 3 10 4 10 5 N As the matrix is random, the fill-in will be terrible and therefore the complexity will be like O(N 2 ) for the solving step, which is present in the last figure. References [1] Jean-Yves L Excellent. Multifrontal Methods: Parallelism, Memory Usage and Numerical Aspects. PhD thesis, Ecole normale supérieure de lyon-ens LYON, 2012. [2] Joseph WH Liu. The multifrontal method for sparse matrix solution: Theory and practice. SIAM review, 34(1):82 109, 1992. 9