Sparse factorization using low rank submatrices. Cleve Ashcraft LSTC 2010 MUMPS User Group Meeting April 15-16, 2010 Toulouse, FRANCE

Size: px

Start display at page:

Download "Sparse factorization using low rank submatrices. Cleve Ashcraft LSTC 2010 MUMPS User Group Meeting April 15-16, 2010 Toulouse, FRANCE"

Bertram Daniel
5 years ago
Views:

1 Sparse factorization using low rank submatrices Cleve Ashcraft LSTC 21 MUMPS User Group Meeting April 15-16, 21 Toulouse, FRANCE ftp.lstc.com:outgoing/cleve/mumps1 Ashcraft.pdf 1

2 LSTC Livermore Software Technology Corporation Livermore, California 9455 Founded by John Hallquist of LLNL, 198 s Public domain versions of DYNA and NIKE codes LS-DYNA : implicit/explicit, nonlinear finite element analysis code Multiphysics capabilities Fluid/structure interaction Thermal analysis Acoustics Electromagnetics 2

3 Multifrontal Algorithm very large, sparse LDL T and LDU factorizations tree structure organizes factor storage, solve and factor operations medium-to-large sparse linear systems located at each leaf node of the tree medium-sized dense linear systems located at each interior node of the tree dense matrix-matrix operations at each interior node sparse matrix-matrix adds between nodes 3

4 Multifrontal tree

5 Multifrontal tree polar representation

6 Multifrontal Algorithm 4M equations, present frontier at LSTC serial, SMP, MPP, hybrid, GPU large problems, > 1M - 1M dof, require out-of-core storage of factor entries, even on distributed memory systems IO cost largely hidden during the factorization IO cost dominant during the solves eigensolver = several right hand sides many applications, e.g., Newton s method, have a single right hand side 6

7 Low Rank Approximations hierarchical matrices, Hackbusch, Bebendorf, Leborne, others semi-separable matrices, Gu, others submatrices are numerically rank deficient method of choice for Boundary Elements (BEM) now applied to Finite Elements (FEM) n r A XY T Y Tn r m A = m X storage = r(m + n) vs mn r reduction in ops = min(m, n) 7

8 Multifrontal Algorithm + Low Rank Approximations At each leaf node in the multifrontal tree use standard multifrontal At each interior node in the multifrontal tree low rank matrix-matrix multiplies and sums Between nodes low rank matrix sums Dramatic reduction in storage Dramatic reduction in operations Excellent approximation properties for finite element operators Our experience is with potential equations, elasticity with solids and shells 8

9 Outline graph, tree, matrix perspectives experiments 2-D potential equation low rank computations blocking strategies summary 9

10 One subgraph One subtree S Ω I S Ω J One submatrix A ΩI,Ω I A ΩI,S A ΩI, S A ΩJ,Ω J A ΩJ,S A ΩJ, S A S,ΩI A S,ΩJ A S,S A S, S A S,ΩI A S,ΩJ A S,S A S, S 1

11 Portion of original matrix A ordered by domains, separator and external boundary nz =

12 Portion of factor matrix L ordered by domains, separator and external boundary nz =

13 Schur complement matrix Schur complement, separator and external boundary 2 4 [ ÂS,S Â S, S Â S,S Â S, S ] = nz =

14 Outline graph, tree, matrix perspectives experiments 2-D potential equation low rank computations blocking strategies summary 14

15 Computational experiments Compute L S,S, L S,S and Â S, S Find domain decompositions of S and S Form block matrices, e.g., L S,S = L K,J K J Find singular value decompositions of each L K,J Collect all singular values {σ} = K J min( K, J ) i=1 σ (K,J) i Split matrix L S,S = M S,S + N S,S using singular values {σ}. We want N S,S F 1 14 L S,S F 15

16 diagonal block factor L S,S 43% dense, relative accuracy L S,S =

17 lower block factor L S,S 16% dense, relative accuracy L S,S =

18 update matrix Â S, S 21% dense, relative accuracy Ã S, S =

19 factor matrix L S,S storage vs accuracy approximating 255 x 255 separator factor matrix L 3,3 2 4 log 1 (1 M F / A F ) x 2 blocks 3 x 3 blocks 4 x 4 blocks 14 5 x 5 blocks 6 x 6 blocks 7 x 7 blocks 8 x 8 blocks fraction of dense storage 19

20 factor matrix L S,S storage vs accuracy approximating 754 x 255 interface separator factor matrix L 4,3 2 4 log 1 (1 M F / A F ) x 1 blocks 6 x 2 blocks 9 x 3 blocks 12 x 4 blocks fraction of dense storage 2

21 update matrix Â S, S storage vs accuracy approximating 754 x 754 schur complement matrix Ahat log 1 (1 M F / A F ) x 2 blocks 3 x 3 blocks 4 x 4 blocks 14 5 x 5 blocks 6 x 6 blocks 7 x 7 blocks 8 x 8 blocks fraction of dense storage 21

22 Outline graph, tree, matrix perspectives experiments 2-D potential equation low rank computations blocking strategies summary 22

23 How to compute low rank submatrices? SVD singular value decomposition A = UΣU T where U and V are orthogonal and Σ is diagonal Gold standard, expensive, O(n 3 ) ops QR factorization AP = QR where Q is orthogonal and R is upper triangular, and P is a permutation matrix Silver standard, moderate cost, O(rn 2 ) ops 23

24 row norms of R vs singular values of A update matrix Â S, S submatrix size near, mid and far interactions row norms of R versus singular values of 754 x 754 A using 8 segments near R i,: 2 near σ i mid R i,: 2 mid σ i far R i,: 2 far σ i

25 SVD vs QR with column pivoting Conclusions : Column pivoting QR factorization does well. Row norms of R track singular values σ The numerical rank of R is greater than needed, but not that much greater For more accuracy/less storage two sided orthogonal factorizations AP = ULV, U and V orthogonal, L triangular PAQ = UBV, U and V orthogonal, B bidiagonal track the singular values very closely. 25

26 Type of approximation of submatrices submatrix L I,J of L S,S, L I,J F = factorization numerical rank total entries none 49 QR ULV SVD submatrix L I,J of L S,S, L I,J F = factorization numerical rank total entries none 49 QR ULV SVD

27 Operations with low rank submatrices A = U A V T A, B = U BV T B, C = U CV T C See Bebendorf, Hierarchical Matrices, Chapter 1 Multiplication A = B C, rank(a) min(rank(b), rank(c)) ) ( ) A = U A VA (U T = B VB T U C VC T = BC ) = U B (VB T U C VC T ) ) = U B ((VB T U C VC T ( )) = (U B VB T U C Addition A = B + C, rank(a) rank(b) +rank(c) ) ( ) A = U A VA (U T = B VB T + U C VC T = B + C = [ ][ ] T U B U C VB V C V T C 27

28 Multiplication A = B C A = U A VA T, B = U BVB T, C = U CVC T A = B C = = ( (m h) (h l) )( (l k) (k n) ) = = = (m min(h,k)) (min(h,k) n) 28

29 Near-near matrix product T near near interaction, L 2,1 L 2,1 σ(l21) σ(l21*l21 T )

30 mid-near matrix product T mid near interaction, L 2,1 L 1,1 2 σ(l21) σ(l11) σ(l21*l11 T )

31 far-near matrix product T far near interaction, L 5,1 L 6,1 2 σ(l51) σ(l61) σ(l51*l61 T )

32 mid-mid matrix product T mid mid interaction, L 1,1 L 1,1 σ(l11) σ(l11*l11 T )

33 far-mid matrix product T mid far interaction, L 6,1 L 1,1 σ(l61) σ(l61*l61 T )

34 far-far matrix product T far far interaction, L 6,1 L 6,1 σ(l61) σ(l61*l61 T )

35 Addition A = B + C rank(a) rank(b) + rank(c) ) ( A = U A VA (U T = B VB T + U C VC T = [ ][ ] T U B U C VB V C = (Q 1 R 1 ) (Q 2 ) R 2 ) T = Q 1 (R 1 R2 T Q T 2 ) = B + C = Q 1 (Q 3 R ( 3 ) Q T 2 ) = (Q 1 Q 3 ) R 3 Q T 2 ( = (Q 1 Q 3 ) Q 2 R3 T ) T R 1 R2 T usually has low numerical rank examples follow 35

36 update matrix, diagonal block Â 3,3 = L 3,1 L T 3,1 + L 3,2L T 3,2 + L 3,3L T 3,3 2 update matrix self interaction σ(near) σ(sum) σ(mid) σ(far) 4 6 log 1 (σ)

37 update matrix, mid-distance off-diagonal block Â 5,3 = L 5,1 L T 3,1 + L 5,2L T 3,2 + L 5,3L T 3, update matrix mid range interaction σ(sum) σ(near) σ(mid) σ(far) 8 log 1 (σ)

38 update matrix, far-distance off-diagonal block Â 7,3 = L 7,1 L T 3,1 + L 7,2L T 3,2 + L 7,3L T 3, update matrix far range interaction σ(sum) σ(near) σ(mid) σ(far) 8 log 1 (σ)

39 Outline graph, tree, matrix perspectives experiments 2-D potential equation low rank computations blocking strategies summary 39

40 Blocking Strategies Active data structures [ ] L J,J L J,J Â J, J partition J and J independently for 2-d problems, J and J are 1-d manifolds for 3-d problems, J and J are 2-d manifolds we need mesh partitioning of the separator and the boundary of a region J each index set of a partition is a segment 4

41 Segment partition 1 segment wirebasket domain decomposition

42 L J,J = σ,τ L σ,τ σ τ J J L J,J = σ,τ L σ,τ σ τ J J Â J, J = σ,τ Â σ,τ σ τ J J

43 Strategy I The partition of J (local to node J) conforms to the partitions of ancestors K, K J Advantages : Update assembly is simplified Â (p(j)) σ 2,τ 2 where σ 1 σ 2, τ 1 τ 2 = Â(p(J)) σ 2,τ 2 + Â(J) σ 1,τ 1 One destination for each Â(J) σ 1,τ 1 Disadvantages : Partition of J can be fragmented, more segments, smaller size, less efficient storage 43

44 Strategy II The partition of J (local to node J) need not conform to the partitions of ancestors Advantages : Partition of J can be optimized better since J is small and localized Disadvantages : Update assembly is more complex Â (p(j)) σ 1 σ 2,τ 1 τ 2 = Â(p(J)) σ 1 σ 2,τ 1 τ 2 + Â(J) σ 1 σ 2,τ 1 τ 2 where σ 1 σ 2, τ 1 τ 2 Several destinations for a submatrix Â(J) σ 1,τ 1 44

45 Outline graph, tree, matrix perspectives experiments 2-D potential equation low rank computations blocking strategies summary 45

46 Multifrontal tree Multifrontal Factorization each leaf node is a d-dimensional sparse FEM matrix each interior node is a (d 1)-dimensional dense BEM matrix use low rank storage and computation at each interior node 46

47 Call to action progress to date in low rank factorizations driven by iterative methods work needed from direct methods community start from industrial strength multifrontal code serial, SMP, MPP, hybrid, GPU pivoting for stability, out-of-core, singular systems, null spaces 47

48 Many challenges Call to action partition of space, partition of interface added programming complexity of low rank matrices challenges to pivot for stability challenges/opportunities to implement in parallel Payoff will be huge reduction in storage footprint reduction in computational work take direct methods to next level of problem size 48

Block Low-Rank (BLR) approximations to improve multifrontal sparse solvers

Block Low-Rank (BLR) approximations to improve multifrontal sparse solvers Joint work with Patrick Amestoy, Cleve Ashcraft, Olivier Boiteau, Alfredo Buttari and Jean-Yves L Excellent, PhD started on October