Sparse factorization using low rank submatrices. Cleve Ashcraft LSTC 2010 MUMPS User Group Meeting April 15-16, 2010 Toulouse, FRANCE

Similar documents
Block Low-Rank (BLR) approximations to improve multifrontal sparse solvers

MULTI-LAYER HIERARCHICAL STRUCTURES AND FACTORIZATIONS

STAT 309: MATHEMATICAL COMPUTATIONS I FALL 2018 LECTURE 13

Improvements for Implicit Linear Equation Solvers

Parallel Numerical Algorithms

V C V L T I 0 C V B 1 V T 0 I. l nk

This can be accomplished by left matrix multiplication as follows: I

Algebra C Numerical Linear Algebra Sample Exam Problems

BlockMatrixComputations and the Singular Value Decomposition. ATaleofTwoIdeas

Solution of Linear Equations

MODEL REDUCTION BY A CROSS-GRAMIAN APPROACH FOR DATA-SPARSE SYSTEMS

Partial Left-Looking Structured Multifrontal Factorization & Algorithms for Compressed Sensing. Cinna Julie Wu

Matrix decompositions

J.I. Aliaga 1 M. Bollhöfer 2 A.F. Martín 1 E.S. Quintana-Ortí 1. March, 2009

Orthogonalization and least squares methods

Solving linear equations with Gaussian Elimination (I)

EE731 Lecture Notes: Matrix Computations for Signal Processing

Enhancing Scalability of Sparse Direct Methods

Fast matrix algebra for dense matrices with rank-deficient off-diagonal blocks

Block Bidiagonal Decomposition and Least Squares Problems

Sparse BLAS-3 Reduction

Index. book 2009/5/27 page 121. (Page numbers set in bold type indicate the definition of an entry.)

Preliminary/Qualifying Exam in Numerical Analysis (Math 502a) Spring 2012

Fast algorithms for hierarchically semiseparable matrices

EFFECTIVE AND ROBUST PRECONDITIONING OF GENERAL SPD MATRICES VIA STRUCTURED INCOMPLETE FACTORIZATION

AMS526: Numerical Analysis I (Numerical Linear Algebra)

MS&E 318 (CME 338) Large-Scale Numerical Optimization

Math 102, Winter Final Exam Review. Chapter 1. Matrices and Gaussian Elimination

Sparse Matrix Theory and Semidefinite Optimization

Numerical Methods in Matrix Computations

1 Overview. 2 Adapting to computing system evolution. 11 th European LS-DYNA Conference 2017, Salzburg, Austria

Introduction to Numerical Linear Algebra II

Linear Least squares

A sparse multifrontal solver using hierarchically semi-separable frontal matrices

Scientific Computing

A Hybrid Method for the Wave Equation. beilina

Parallel Singular Value Decomposition. Jiaxing Tan

Main matrix factorizations

arxiv: v1 [cs.na] 20 Jul 2015

A DISTRIBUTED-MEMORY RANDOMIZED STRUCTURED MULTIFRONTAL METHOD FOR SPARSE DIRECT SOLUTIONS

5.1 Banded Storage. u = temperature. The five-point difference operator. uh (x, y + h) 2u h (x, y)+u h (x, y h) uh (x + h, y) 2u h (x, y)+u h (x h, y)

Conceptual Questions for Review

be a Householder matrix. Then prove the followings H = I 2 uut Hu = (I 2 uu u T u )u = u 2 uut u

Utilisation de la compression low-rank pour réduire la complexité du solveur PaStiX

AMS526: Numerical Analysis I (Numerical Linear Algebra for Computational and Data Sciences)

SOLVING SPARSE LINEAR SYSTEMS OF EQUATIONS. Chao Yang Computational Research Division Lawrence Berkeley National Laboratory Berkeley, CA, USA

A DISTRIBUTED-MEMORY RANDOMIZED STRUCTURED MULTIFRONTAL METHOD FOR SPARSE DIRECT SOLUTIONS

Numerical Linear Algebra Primer. Ryan Tibshirani Convex Optimization /36-725

Math 407: Linear Optimization

Fast Hessenberg QR Iteration for Companion Matrices

AMS526: Numerical Analysis I (Numerical Linear Algebra for Computational and Data Sciences)

Section 3.3. Matrix Rank and the Inverse of a Full Rank Matrix

Lecture 9: Numerical Linear Algebra Primer (February 11st)

Effective matrix-free preconditioning for the augmented immersed interface method

Block-tridiagonal matrices

FAST STRUCTURED EIGENSOLVER FOR DISCRETIZED PARTIAL DIFFERENTIAL OPERATORS ON GENERAL MESHES

LECTURE 7. Least Squares and Variants. Optimization Models EE 127 / EE 227AT. Outline. Least Squares. Notes. Notes. Notes. Notes.

Lecture 9: SVD, Low Rank Approximation

Review of Some Concepts from Linear Algebra: Part 2

Multilevel low-rank approximation preconditioners Yousef Saad Department of Computer Science and Engineering University of Minnesota

B553 Lecture 5: Matrix Algebra Review

Homework 2 Foundations of Computational Math 2 Spring 2019

c 2017 Society for Industrial and Applied Mathematics

New Developments of Frequency Domain Acoustic Methods in LS-DYNA

AMS526: Numerical Analysis I (Numerical Linear Algebra)

Practical Linear Algebra: A Geometry Toolbox

Row Space and Column Space of a Matrix

A dissection solver with kernel detection for symmetric finite element matrices on shared memory computers

APPLIED NUMERICAL LINEAR ALGEBRA

MAT 610: Numerical Linear Algebra. James V. Lambers

Linear Algebra, part 3 QR and SVD

Matrix Factorization and Analysis

1 Multiply Eq. E i by λ 0: (λe i ) (E i ) 2 Multiply Eq. E j by λ and add to Eq. E i : (E i + λe j ) (E i )

Computing least squares condition numbers on hybrid multicore/gpu systems

Linear Solvers. Andrew Hazel

Matrices. Chapter What is a Matrix? We review the basic matrix operations. An array of numbers a a 1n A = a m1...

Sparse Linear Algebra: Direct Methods, advanced features

Miscellaneous Results, Solving Equations, and Generalized Inverses. opyright c 2012 Dan Nettleton (Iowa State University) Statistics / 51

Coupling of the EM Solver with Mechanical and Thermal Shell Elements

From Matrix to Tensor. Charles F. Van Loan

MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS. + + x 1 x 2. x n 8 (4) 3 4 2

5.6. PSEUDOINVERSES 101. A H w.

Linear Algebra, part 3. Going back to least squares. Mathematical Models, Analysis and Simulation = 0. a T 1 e. a T n e. Anna-Karin Tornberg

Problem set 5: SVD, Orthogonal projections, etc.

Communication Avoiding LU Factorization using Complete Pivoting

Numerical Methods I Non-Square and Sparse Linear Systems

Using SVD to Recommend Movies

Lecture 3: QR-Factorization

Gypsilab : a MATLAB toolbox for FEM-BEM coupling

An H-LU Based Direct Finite Element Solver Accelerated by Nested Dissection for Large-scale Modeling of ICs and Packages

Large Scale Sparse Linear Algebra

Solving PDEs with CUDA Jonathan Cohen

Rank revealing factorizations, and low rank approximations

14.2 QR Factorization with Column Pivoting

Math 671: Tensor Train decomposition methods

LU Factorization with Panel Rank Revealing Pivoting and Its Communication Avoiding Version

Computational Methods. Least Squares Approximation/Optimization

Numerical Simulation and Experimental Study of Electromagnetic Forming

Positive Definite Matrix

AMS526: Numerical Analysis I (Numerical Linear Algebra)

Transcription:

Sparse factorization using low rank submatrices Cleve Ashcraft LSTC cleve@lstc.com 21 MUMPS User Group Meeting April 15-16, 21 Toulouse, FRANCE ftp.lstc.com:outgoing/cleve/mumps1 Ashcraft.pdf 1

LSTC Livermore Software Technology Corporation Livermore, California 9455 Founded by John Hallquist of LLNL, 198 s Public domain versions of DYNA and NIKE codes LS-DYNA : implicit/explicit, nonlinear finite element analysis code Multiphysics capabilities Fluid/structure interaction Thermal analysis Acoustics Electromagnetics 2

Multifrontal Algorithm very large, sparse LDL T and LDU factorizations tree structure organizes factor storage, solve and factor operations medium-to-large sparse linear systems located at each leaf node of the tree medium-sized dense linear systems located at each interior node of the tree dense matrix-matrix operations at each interior node sparse matrix-matrix adds between nodes 3

Multifrontal tree 1 2 3 4 5 6 7 8 9 1 11 12 13 14 15 16 17 18 19 2 21 22 23 24 25 26 27 28 29 3 31 32 33 34 35 36 37 38 39 4 41 42 43 44 45 46 47 48 49 5 51 52 53 54 55 56 57 58 59 6 61 62 63 64 65 66 67 68 69 7 71 4

Multifrontal tree polar representation 1 2 3 4 5 6 7 8 9 1 11 12 13 14 15 16 17 18 19 2 21 22 23 24 25 26 27 28 29 3 31 32 33 34 35 36 37 38 39 4 41 42 43 44 45 46 47 48 49 5 51 52 53 54 55 56 57 58 59 6 61 62 63 64 65 66 67 68 69 7 71 5

Multifrontal Algorithm 4M equations, present frontier at LSTC serial, SMP, MPP, hybrid, GPU large problems, > 1M - 1M dof, require out-of-core storage of factor entries, even on distributed memory systems IO cost largely hidden during the factorization IO cost dominant during the solves eigensolver = several right hand sides many applications, e.g., Newton s method, have a single right hand side 6

Low Rank Approximations hierarchical matrices, Hackbusch, Bebendorf, Leborne, others semi-separable matrices, Gu, others submatrices are numerically rank deficient method of choice for Boundary Elements (BEM) now applied to Finite Elements (FEM) n r A XY T Y Tn r m A = m X storage = r(m + n) vs mn r reduction in ops = min(m, n) 7

Multifrontal Algorithm + Low Rank Approximations At each leaf node in the multifrontal tree use standard multifrontal At each interior node in the multifrontal tree low rank matrix-matrix multiplies and sums Between nodes low rank matrix sums Dramatic reduction in storage Dramatic reduction in operations Excellent approximation properties for finite element operators Our experience is with potential equations, elasticity with solids and shells 8

Outline graph, tree, matrix perspectives experiments 2-D potential equation low rank computations blocking strategies summary 9

One subgraph 1.8.6 One subtree S.4.2.2.4.6.8 1 1.8.6.4.2.2.4.6.8 1 Ω I S Ω J One submatrix A ΩI,Ω I A ΩI,S A ΩI, S A ΩJ,Ω J A ΩJ,S A ΩJ, S A S,ΩI A S,ΩJ A S,S A S, S A S,ΩI A S,ΩJ A S,S A S, S 1

Portion of original matrix A ordered by domains, separator and external boundary 2 4 6 8 1 12 2 4 6 8 1 12 nz = 952 11

Portion of factor matrix L ordered by domains, separator and external boundary 2 4 6 8 1 12 2 4 6 8 nz = 48899 12 1 12

Schur complement matrix Schur complement, separator and external boundary 2 4 [ ÂS,S Â S, S Â S,S Â S, S ] = 6 8 1 12 14 16 2 4 6 8 1 12 14 16 nz = 2659 13

Outline graph, tree, matrix perspectives experiments 2-D potential equation low rank computations blocking strategies summary 14

Computational experiments Compute L S,S, L S,S and  S, S Find domain decompositions of S and S Form block matrices, e.g., L S,S = L K,J K J Find singular value decompositions of each L K,J Collect all singular values {σ} = K J min( K, J ) i=1 σ (K,J) i Split matrix L S,S = M S,S + N S,S using singular values {σ}. We want N S,S F 1 14 L S,S F 15

255 255 diagonal block factor L S,S 43% dense, relative accuracy 1 14 25 2 L S,S = 15 1 5 5 1 15 2 25 16

754 255 lower block factor L S,S 16% dense, relative accuracy 1 14 7 6 5 L S,S = 4 3 2 1 1 2 17

754 754 update matrix  S, S 21% dense, relative accuracy 1 14 7 6 5 à S, S = 4 3 2 1 1 2 3 4 5 6 7 18

255 255 factor matrix L S,S storage vs accuracy approximating 255 x 255 separator factor matrix L 3,3 2 4 log 1 (1 M F / A F ) 6 8 1 12 2 x 2 blocks 3 x 3 blocks 4 x 4 blocks 14 5 x 5 blocks 6 x 6 blocks 7 x 7 blocks 8 x 8 blocks 16.1.2.3.4.5.6 fraction of dense storage 19

754 255 factor matrix L S,S storage vs accuracy approximating 754 x 255 interface separator factor matrix L 4,3 2 4 log 1 (1 M F / A F ) 6 8 1 12 14 3 x 1 blocks 6 x 2 blocks 9 x 3 blocks 12 x 4 blocks 16.5.1.15.2 fraction of dense storage 2

754 754 update matrix  S, S storage vs accuracy approximating 754 x 754 schur complement matrix Ahat44 2 4 log 1 (1 M F / A F ) 6 8 1 12 2 x 2 blocks 3 x 3 blocks 4 x 4 blocks 14 5 x 5 blocks 6 x 6 blocks 7 x 7 blocks 8 x 8 blocks 16.1.2.3.4.5 fraction of dense storage 21

Outline graph, tree, matrix perspectives experiments 2-D potential equation low rank computations blocking strategies summary 22

How to compute low rank submatrices? SVD singular value decomposition A = UΣU T where U and V are orthogonal and Σ is diagonal Gold standard, expensive, O(n 3 ) ops QR factorization AP = QR where Q is orthogonal and R is upper triangular, and P is a permutation matrix Silver standard, moderate cost, O(rn 2 ) ops 23

row norms of R vs singular values of A 754 754 update matrix  S, S 94 94 submatrix size near, mid and far interactions 2 4 6 row norms of R versus singular values of 754 x 754 A using 8 segments near R i,: 2 near σ i mid R i,: 2 mid σ i far R i,: 2 far σ i 8 1 12 14 16 18 2 2 4 6 8 1 12 14 16 18 2 24

SVD vs QR with column pivoting Conclusions : Column pivoting QR factorization does well. Row norms of R track singular values σ The numerical rank of R is greater than needed, but not that much greater For more accuracy/less storage two sided orthogonal factorizations AP = ULV, U and V orthogonal, L triangular PAQ = UBV, U and V orthogonal, B bidiagonal track the singular values very closely. 25

Type of approximation of submatrices submatrix L I,J of L S,S, L I,J F = 2.92 1 2 factorization numerical rank total entries none 49 QR 2 2681 ULV 18 268 SVD 18 2538 submatrix L I,J of L S,S, L I,J F = 2.4 1 3 factorization numerical rank total entries none 49 QR 14 1896 ULV 1 1447 SVD 1 141 26

Operations with low rank submatrices A = U A V T A, B = U BV T B, C = U CV T C See Bebendorf, Hierarchical Matrices, Chapter 1 Multiplication A = B C, rank(a) min(rank(b), rank(c)) ) ( ) A = U A VA (U T = B VB T U C VC T = BC ) = U B (VB T U C VC T ) ) = U B ((VB T U C VC T ( )) = (U B VB T U C Addition A = B + C, rank(a) rank(b) +rank(c) ) ( ) A = U A VA (U T = B VB T + U C VC T = B + C = [ ][ ] T U B U C VB V C V T C 27

Multiplication A = B C A = U A VA T, B = U BVB T, C = U CVC T A = B C = = ( (m h) (h l) )( (l k) (k n) ) = = = (m min(h,k)) (min(h,k) n) 28

Near-near matrix product T near near interaction, L 2,1 L 2,1 σ(l21) σ(l21*l21 T ) 2 4 6 8 1 12 14 16 5 1 15 2 25 3 35 29

mid-near matrix product T mid near interaction, L 2,1 L 1,1 2 σ(l21) σ(l11) σ(l21*l11 T ) 4 6 8 1 12 14 16 5 1 15 2 25 3 35 3

far-near matrix product T far near interaction, L 5,1 L 6,1 2 σ(l51) σ(l61) σ(l51*l61 T ) 4 6 8 1 12 14 16 5 1 15 2 25 3 35 31

mid-mid matrix product T mid mid interaction, L 1,1 L 1,1 σ(l11) σ(l11*l11 T ) 2 4 6 8 1 12 14 16 2 4 6 8 1 12 14 16 32

far-mid matrix product T mid far interaction, L 6,1 L 1,1 σ(l61) σ(l61*l61 T ) 2 4 6 8 1 12 14 16 2 4 6 8 1 12 14 16 33

far-far matrix product T far far interaction, L 6,1 L 6,1 σ(l61) σ(l61*l61 T ) 2 4 6 8 1 12 14 16 1 2 3 4 5 6 7 8 9 1 11 34

Addition A = B + C rank(a) rank(b) + rank(c) ) ( A = U A VA (U T = B VB T + U C VC T = [ ][ ] T U B U C VB V C = (Q 1 R 1 ) (Q 2 ) R 2 ) T = Q 1 (R 1 R2 T Q T 2 ) = B + C = Q 1 (Q 3 R ( 3 ) Q T 2 ) = (Q 1 Q 3 ) R 3 Q T 2 ( = (Q 1 Q 3 ) Q 2 R3 T ) T R 1 R2 T usually has low numerical rank examples follow 35

update matrix, diagonal block  3,3 = L 3,1 L T 3,1 + L 3,2L T 3,2 + L 3,3L T 3,3 2 update matrix self interaction σ(near) σ(sum) σ(mid) σ(far) 4 6 log 1 (σ) 8 1 12 14 16 5 1 15 2 25 36

update matrix, mid-distance off-diagonal block  5,3 = L 5,1 L T 3,1 + L 5,2L T 3,2 + L 5,3L T 3,3 2 4 6 update matrix mid range interaction σ(sum) σ(near) σ(mid) σ(far) 8 log 1 (σ) 1 12 14 16 18 2 2 4 6 8 1 12 37

update matrix, far-distance off-diagonal block  7,3 = L 7,1 L T 3,1 + L 7,2L T 3,2 + L 7,3L T 3,3 2 4 6 update matrix far range interaction σ(sum) σ(near) σ(mid) σ(far) 8 log 1 (σ) 1 12 14 16 18 2 2 4 6 8 1 12 38

Outline graph, tree, matrix perspectives experiments 2-D potential equation low rank computations blocking strategies summary 39

Blocking Strategies Active data structures [ ] L J,J L J,J Â J, J partition J and J independently for 2-d problems, J and J are 1-d manifolds for 3-d problems, J and J are 2-d manifolds we need mesh partitioning of the separator and the boundary of a region J each index set of a partition is a segment 4

Segment partition 1 segment wirebasket domain decomposition.8.6.4.2.2.4.6.8 1 1.8.6.4.2.2.4.6.8 1 41

25 2 15 L J,J = σ,τ L σ,τ σ τ J J 1 5 5 1 15 2 25 7 6 4 5 L J,J = σ,τ L σ,τ σ τ J J 3 2 1 1 2 7 6 4 5 Â J, J = σ,τ Â σ,τ σ τ J J 3 2 1 1 2 3 4 5 6 7 42

Strategy I The partition of J (local to node J) conforms to the partitions of ancestors K, K J Advantages : Update assembly is simplified  (p(j)) σ 2,τ 2 where σ 1 σ 2, τ 1 τ 2 = Â(p(J)) σ 2,τ 2 + Â(J) σ 1,τ 1 One destination for each Â(J) σ 1,τ 1 Disadvantages : Partition of J can be fragmented, more segments, smaller size, less efficient storage 43

Strategy II The partition of J (local to node J) need not conform to the partitions of ancestors Advantages : Partition of J can be optimized better since J is small and localized Disadvantages : Update assembly is more complex  (p(j)) σ 1 σ 2,τ 1 τ 2 = Â(p(J)) σ 1 σ 2,τ 1 τ 2 + Â(J) σ 1 σ 2,τ 1 τ 2 where σ 1 σ 2, τ 1 τ 2 Several destinations for a submatrix Â(J) σ 1,τ 1 44

Outline graph, tree, matrix perspectives experiments 2-D potential equation low rank computations blocking strategies summary 45

Multifrontal tree Multifrontal Factorization 18 57 58 59 65 6 66 62 67 63 64 68 69 19 2 21 22 15 17 16 14 29 28 26 27 25 24 31 3 23 35 52 54 36 53 55 37 56 38 41 61 39 42 4 43 48 51 7 5 49 71 47 32 12 9 13 11 8 1 7 2 1 6 5 4 3 34 46 33 45 44 each leaf node is a d-dimensional sparse FEM matrix each interior node is a (d 1)-dimensional dense BEM matrix use low rank storage and computation at each interior node 46

Call to action progress to date in low rank factorizations driven by iterative methods work needed from direct methods community start from industrial strength multifrontal code serial, SMP, MPP, hybrid, GPU pivoting for stability, out-of-core, singular systems, null spaces 47

Many challenges Call to action partition of space, partition of interface added programming complexity of low rank matrices challenges to pivot for stability challenges/opportunities to implement in parallel Payoff will be huge reduction in storage footprint reduction in computational work take direct methods to next level of problem size 48