A sparse multifrontal solver using hierarchically semi-separable frontal matrices
|
|
- Nelson McKenzie
- 5 years ago
- Views:
Transcription
1 A sparse multifrontal solver using hierarchically semi-separable frontal matrices Pieter Ghysels Lawrence Berkeley National Laboratory Joint work with: Xiaoye S. Li (LBNL), Artem Napov (ULB), François-Henry Rouet (LBNL) 2014 CBMS-NSF Conference: Fast direct solvers for elliptic PDEs June 23 29, 2014, Dartmouth College
2 Introduction Consider solving Ax = b or M 1 A = M 1 b with a preconditioned iterative method (CG, GMRES, etc.) Fast convergence if good preconditioner M A is available 1. Cheap to construct, store, apply, parallelize 2. Good approximation of A Contradictory goals trade-off Standard design strategy Start with a direct factorization (like multifrontal LU) Add approximations to make it cheaper (cf. 1) while (hopefully/provably) affecting little (2) Approximation idea: use low-rank approximation
3 he multifrontal method [Duff & Reid 83] 5 Nested dissection reordering defines an elimination tree - SCOCH graph partitioner Bottom-up traversal of the e-tree At each frontal matrix, partial factorization and computation of a contribution block (Schur complement) Parent nodes sum the contribution blocks of their children: extend-add Elimination tree
4 Low-rank property In many applications, frontal matrices exhibit some low-rank blocks ( data sparsity ) Compression with SVD, Rank-Revealing QR,... - SVD: optimal but too expensive - RRQR: QR with column pivoting Exact compression or with a compression threshold ε Recursively, diagonal blocks have low-rank subblocks too - What to do with off-diagonal blocks?
5 Low-rank representations Most low-rank representations belong to the class of H-matrices [Bebendof, Börm, Hackbush, Grasedyck,... ]. Embedded in both dense and sparse solvers: U 1 V 1 U 2 U 3 = U1 0 U3 small 0 U2 Hierarchically Off-Diag. Low Rank (HODLR) [Ambikasaran, Cecka, Darve... ] U 6 B 36 V 3 Hierarchically Semiseparable (HSS) [Chandrasekaran, Dewilde, Gu, Li, Xia,... ] Nested basis. Block-low rank (BLR) [Amestoy et al.] Simple 2D partitioning. Recently in MUMPS. Also: H 2 (includes HSS and FMM), SSS... Choice: simple representations apply to broad classes of problems but provide less gains in memory/operations than specialized/complex ones.
6 HSS/HBS representation he structure is represented by a tree: 7 B 3,6 U 1 U 3, V 3 3 B 6,3 6 U 6, V 6 V 1 U 2 U 3 = U1 0 U3 small 0 U 2 B 1,2 B B 5,4 5 B 4,5 U 1, V 1 2,1 U 2, V 2 U 4, V 4 U 5, V 5 D 1 D 2 D 4 D 5 U 6 B 36 V 3 Number of leaves depends on the problem (geometry) and number of processors to be used. Building the HSS structure and all the usual operations (multiplying... ) consist of traversals of the tree.
7 Embedding low-rank techniques in a multifrontal solver 1. Choose which frontal matrices are compressed (size, level... ) 2. Low-rankness: weak interactions between distant variables = need suitable ordering/clustering of each frontal matrix Geometric setting (3D grid): 2D plane separator Need clusters with small diameters Hierarchical formats, merged clusters need small diameter too Split domain into squares and order with Morton ordering Algebraic: add some kind of halo to (complete) graph of separator variables and call a graph partitioner (MEIS) [Amestoy et al., Napov]
8 Embedding HSS kernels in a multifrontal solver More complicated HSS for frontal matrices: Fully structured: HSS on the whole frontal matrix. No dense matrix. Partial+: HSS on the whole frontal matrix. Dense frontal matrix. Partially structured: HSS on the F 11, F 12 and F 21 parts only. Dense frontal matrix, dense CB = F 22 F 21 F11 1 F 12 in stack. More memory F 11 F 21 F 12 F 22 Partially structured can do regular extend-add In partially structured, HSS compression of dense matrix After HSS compression, ULV factorization of F 11 block - Compared to classical LU in dense case Low rank Schur complement update
9 HSS compression via randomized sampling [Martinsson 11, Xia 13] HSS compression of a matrix A. Ingredients: R r and R c random matrices with d columns d = r + p with r estimated max rank; p = 10 in practice S r = AR r and S c = A R c samples of matrix A Can benefit from a fast matvec Interpolative Decomposition: A = A(:, J) X A is linear combination of selected columns of A wo sided ID: S c = S c (:, J c )X c and S r = S r (:, J r )X r, A = X c A(I c, I r )X r
10 HSS compression via randomized sampling 2 Algorithm (symmetric): from fine to coarse do Leaf node τ: 1. Sample: S loc = S(I τ, :) D R(I τ, :) 2. ID: S loc = U τ S loc (J τ, :) 3. Update: S τ = S loc (J τ, :) R τ = U τ R(I τ, :) I τ = I τ (J τ, :) Inner node τ with children [ ν 1, ν 2 : ] Sν1 A(I 1. Sample: S loc = ν1, I ν2 ) R ν2 S ν2 A(I ν2, I ν1 ) R ν1 2. ID: S loc = U τ S loc (J τ, :) 3. Update: S τ = S loc (J τ, :) R τ I τ = Uτ [R ν1 ; R ν2 ] = [I ν1 I ν2 ](J τ, :)
11 HSS compression via randomized sampling 3 If A A, do this for columns as well (simultaneously) [ ] I Bases have special structure: U τ = Π τ E τ Extract elements from frontal matrix: D τ = A(I τ, I τ ) and B ν1,ν 2 = A(I ν1, I ν2 ) Frontal matrix is combination of separator and HSS children Extracting element from HSS matrix requires traversing the HSS tree and multiplying basis matrices Limiting number of tree traversals is crucial for performance Benefits: Extend-add operation is simplified: only on random vectors Gains in complexity: O(r 2 N log N) iso O(rN 2 ) for non-randomized algorithm. log N due to extracting elements from HSS matrix
12 Randomized sampling extend-add Assembly in regular multifrontal: F p = A p CBc1 CBc2. Sample: S p = F p R p = (A p CBc1 CBc2 )R p = A p R p Y c1 Y c2 1D extend-add (only along rows); much simpler Y c1 and Y c2 samples of CB of children. R p = R c1 R c2 (+random rows for missing indices) Stages: Build random vectors from random vectors of children Build sample from samples of CB of children Multiply separator part of frontal matrix with random vectors: A p R p Compression of F p using S p and R p
13 HSS ULV factorization Exploit structure of U τ (from ID) to introduce zero s [ ] [ ] [ ] I Eτ I U τ = Π τ, Ω E τ = Π 0 τ I 0 τ Ω τ U τ = I D 1 U 1 B 2;1 V 2 U 2 B 1;2 V 1 D 2
14 HSS ULV factorization Exploit structure of U τ (from ID) to introduce zero s [ ] [ ] [ ] I Eτ I U τ = Π τ, Ω E τ = Π 0 τ I 0 τ Ω τ U τ = I [ ] [ ] [ ] Ω1 D 1 U 1B 1,2V2 W1 B = 1,2V Ω 2 U 2B 2,1V1 2 D 2 B 2,1V1 W 2 D 1 U 1 B 2;1 V 2 W 1;1 B 2;1 V 2 W 1;2 U 2 B 1;2 V 1 D 2 B 1;2 V 1 W 2;1 W 2;2
15 HSS ULV factorization Exploit structure of U τ (from ID) to introduce zero s [ ] [ ] [ ] I Eτ I U τ = Π τ, Ω E τ = Π 0 τ I 0 τ Ω τ U τ = I [ ] [ ] [ ] Ω1 D 1 U 1B 1,2V2 W1 B = 1,2V Ω 2 U 2B 2,1V1 2 D 2 B 2,1V1 W 2 ake (full) LQ decomposition [[ ] ] L 1 Lτ 0 Q W τ = τ [W 1;2 Q1 ] [B 1,2V2 Q 2 ] [ ] Q 1 W τ;2 L 2 Q 2 [B 2,1 V1 Q 1 ] [W 2;2Q2 ] D 1 U 1 B 2;1 V 2 W 1;1 B 2;1 V 2 L 1 B 2;1 V 2 Q 2 W 1;2 W 1;2,1 U 2 B 1;2 V 1 D 2 B 1;2 V 1 W 2;1 B 1;2 V 1 Q 1 L 2 W 2;2 W 2;2,1
16 HSS ULV factorization Exploit structure of U τ (from ID) to introduce zero s [ ] [ ] [ ] I Eτ I U τ = Π τ, Ω E τ = Π 0 τ I 0 τ Ω τ U τ = I [ ] [ ] [ ] Ω1 D 1 U 1B 1,2V2 W1 B = 1,2V Ω 2 U 2B 2,1V1 2 D 2 B 2,1V1 W 2 ake (full) LQ decomposition [[ ] ] L 1 Lτ 0 Q W τ = τ [W 1;2 Q1 ] [B 1,2V2 Q 2 ] [ ] Q 1 W τ;2 L 2 Q 2 [B 2,1 V1 Q 1 ] [W 2;2Q2 ] Rows for L τ can be eliminated, others are passed to parent At root node: LU solve of reduced D L 1 B 2;1 V 2 Q 2 W 1;2,1 B 1;2 V 1 Q 1 L 2 W 2;2,1
17 HSS ULV factorization Exploit structure of U τ (from ID) to introduce zero s [ ] [ ] [ ] I Eτ I U τ = Π τ, Ω E τ = Π 0 τ I 0 τ Ω τ U τ = I [ ] [ ] [ ] Ω1 D 1 U 1B 1,2V2 W1 B = 1,2V Ω 2 U 2B 2,1V1 2 D 2 B 2,1V1 W 2 ake (full) LQ decomposition [[ ] ] L 1 Lτ 0 Q W τ = τ [W 1;2 Q1 ] [B 1,2V2 Q 2 ] [ ] Q 1 W τ;2 L 2 Q 2 [B 2,1 V1 Q 1 ] [W 2;2Q2 ] Rows for L τ can be eliminated, others are passed to parent At root node: LU solve of reduced D ULV-like: Ω τ not orthonormal, forward/backward solve phases
18 Low rank Schur complement update Schur Complement update F 21 {}}{ F 22 F 21 F11 1 F 12 = F 22 U q B qk Vk F 1 11 F 12 {}}{ U k B kq Vq F 1 11 via ULV solve V k F 1 11 U k O(rN 2 ) D k is reduced HSS matrix O(r r) k q F 22 U q B qk (Ṽ k D 1 Ũ k )B kq V q F 11 F12 Ṽk D 1 Ũ k O(r 3 ) F 22 ΨΦ Ψ, Φ O(rN) U q and V q : traverse q subtree Cheap multiply with random vectors F 21 F 22
19 Numerical example general matrices GMRES(30) with right preconditioner Multifrontal with HSS vs ILUP from SuperLU b = (1, 1,... ), x 0 = (0, 0,... ) Stopping criterium: r k 2 / b HSS compression tolerance: 10 6 fill-ratio factor (s) its Matrix descr N rank HSS ILU HSS ILU HSS ILU add32 circuit 4, mchln85ks17 car tire 84, mhd500 plasma 250, poli large economics 15, stomach bio eng. 213, tdr190k accelerator 1, 100, torso3 bio eng. 259, utm5940 tokamak 5, wang4 device 26,
20 Numerical example tdr190k tdr190k: Maxwell equations in the frequency domain GMRES(30) convergence
21 Parallel implementation We have a serial code (StruMF [Napov ]) Some performance issues - Currently generates random vectors for all nodes in e-tree - How to estimate the rank? Currently guess and start over when too small Parallel implementation is a work in progress Distributed memory HSS compression of dense matrix - MPI, BLACS, PBLAS, BLAS, LAPACK Shared memory multifrontal code - OpenMP task parallelism for tree traversal for both elimination tree and HSS tree - Next step is parallel dense algebra
22 Parallel HSS compression MPI code opmost separator of a 3D problem, generated with exact multifrontal method k N 10,000 40,000 90,000 MPI processes / cores Nodes Levels olerance 1e-3 1e-3 1-e3 Non-randomized (s) Randomized (s) Dense LU ScaLAPACK (s) On 1024 cores achieved 5.3Flops/s very good flop balance: min / max = % communication overhead
23 ask based parallel tree traversal OpenMP Postorder (leaf to root) tree traversal: handle siblings in parallel, wait for children void traverse(node* p) { if (p->left) #pragma omp task traverse(p->left); if (p->right) #pragma omp task traverse(p->right); #pragma omp taskwait process(p->data); } Run-time system schedules tasks to cores (work-stealing) Root node will be handled sequentially: scaling bottleneck Nested trees: node of nested dissection tree contains HSS tree
24 time (s) time (s) ask based parallel tree traversal OpenMP HSS Compression of dense frontal matrix thread ID Multifrontal thread ID Blue: E-tree node, Red: HSS compression, Green: ULV-fact Extraction of elements from HSS matrix forms bottleneck Considering other runtime task schedulers Intel BB, StarPU, Quark he Quark scheduler from PLASMA could allow integration of PLASMA parallel (tiled) BLAS/LAPACK
25 Conclusions Developing an algebraic preconditioner for general nonsymmetric matrices (from PDEs) HSS is restricted format, large gains possible for certain applications, not for all Graph partitioning difficulties - Separator not always just a plane/line, not always a single piece - Bad for rank structure Some performance issues need to be addressed Separate distributed and shared memory codes - How to combine in a hybrid MPI+X code? Prepare for next generation NERSC supercomputer - Intel MIC based, > 60 cores
26 Conclusions Developing an algebraic preconditioner for general nonsymmetric matrices (from PDEs) HSS is restricted format, large gains possible for certain applications, not for all Graph partitioning difficulties - Separator not always just a plane/line, not always a single piece - Bad for rank structure Some performance issues need to be addressed Separate distributed and shared memory codes - How to combine in a hybrid MPI+X code? Prepare for next generation NERSC supercomputer - Intel MIC based, > 60 cores hank you! Questions?
Fast algorithms for hierarchically semiseparable matrices
NUMERICAL LINEAR ALGEBRA WITH APPLICATIONS Numer. Linear Algebra Appl. 2010; 17:953 976 Published online 22 December 2009 in Wiley Online Library (wileyonlinelibrary.com)..691 Fast algorithms for hierarchically
More informationMULTI-LAYER HIERARCHICAL STRUCTURES AND FACTORIZATIONS
MULTI-LAYER HIERARCHICAL STRUCTURES AND FACTORIZATIONS JIANLIN XIA Abstract. We propose multi-layer hierarchically semiseparable MHS structures for the fast factorizations of dense matrices arising from
More informationEnhancing Scalability of Sparse Direct Methods
Journal of Physics: Conference Series 78 (007) 0 doi:0.088/7-6596/78//0 Enhancing Scalability of Sparse Direct Methods X.S. Li, J. Demmel, L. Grigori, M. Gu, J. Xia 5, S. Jardin 6, C. Sovinec 7, L.-Q.
More informationIncomplete Cholesky preconditioners that exploit the low-rank property
anapov@ulb.ac.be ; http://homepages.ulb.ac.be/ anapov/ 1 / 35 Incomplete Cholesky preconditioners that exploit the low-rank property (theory and practice) Artem Napov Service de Métrologie Nucléaire, Université
More informationBlock Low-Rank (BLR) approximations to improve multifrontal sparse solvers
Block Low-Rank (BLR) approximations to improve multifrontal sparse solvers Joint work with Patrick Amestoy, Cleve Ashcraft, Olivier Boiteau, Alfredo Buttari and Jean-Yves L Excellent, PhD started on October
More informationA Parallel Geometric Multifrontal Solver Using Hierarchically Semiseparable Structure
Page 26 of 46 A Parallel Geometric Multifrontal Solver Using Hierarchically Semiseparable Structure SHEN WANG, Department of Mathematics, Purdue University XIAOYE S. LI, Lawrence Berkeley National Laboratory
More informationA DISTRIBUTED-MEMORY RANDOMIZED STRUCTURED MULTIFRONTAL METHOD FOR SPARSE DIRECT SOLUTIONS
A DISTRIBUTED-MEMORY RANDOMIZED STRUCTURED MULTIFRONTAL METHOD FOR SPARSE DIRECT SOLUTIONS ZIXING XIN, JIANLIN XIA, MAARTEN V. DE HOOP, STEPHEN CAULEY, AND VENKATARAMANAN BALAKRISHNAN Abstract. We design
More informationA DISTRIBUTED-MEMORY RANDOMIZED STRUCTURED MULTIFRONTAL METHOD FOR SPARSE DIRECT SOLUTIONS
SIAM J. SCI. COMPUT. Vol. 39, No. 4, pp. C292 C318 c 2017 Society for Industrial and Applied Mathematics A DISTRIBUTED-MEMORY RANDOMIZED STRUCTURED MULTIFRONTAL METHOD FOR SPARSE DIRECT SOLUTIONS ZIXING
More informationFast matrix algebra for dense matrices with rank-deficient off-diagonal blocks
CHAPTER 2 Fast matrix algebra for dense matrices with rank-deficient off-diagonal blocks Chapter summary: The chapter describes techniques for rapidly performing algebraic operations on dense matrices
More informationPartial Left-Looking Structured Multifrontal Factorization & Algorithms for Compressed Sensing. Cinna Julie Wu
Partial Left-Looking Structured Multifrontal Factorization & Algorithms for Compressed Sensing by Cinna Julie Wu A dissertation submitted in partial satisfaction of the requirements for the degree of Doctor
More informationA robust inner-outer HSS preconditioner
NUMERICAL LINEAR ALGEBRA WIH APPLICAIONS Numer. Linear Algebra Appl. 2011; 00:1 0 [Version: 2002/09/18 v1.02] A robust inner-outer HSS preconditioner Jianlin Xia 1 1 Department of Mathematics, Purdue University,
More informationFAST STRUCTURED EIGENSOLVER FOR DISCRETIZED PARTIAL DIFFERENTIAL OPERATORS ON GENERAL MESHES
Proceedings of the Project Review, Geo-Mathematical Imaging Group Purdue University, West Lafayette IN, Vol. 1 2012 pp. 123-132. FAST STRUCTURED EIGENSOLVER FOR DISCRETIZED PARTIAL DIFFERENTIAL OPERATORS
More informationOn the design of parallel linear solvers for large scale problems
On the design of parallel linear solvers for large scale problems ICIAM - August 2015 - Mini-Symposium on Recent advances in matrix computations for extreme-scale computers M. Faverge, X. Lacoste, G. Pichon,
More informationJ.I. Aliaga 1 M. Bollhöfer 2 A.F. Martín 1 E.S. Quintana-Ortí 1. March, 2009
Parallel Preconditioning of Linear Systems based on ILUPACK for Multithreaded Architectures J.I. Aliaga M. Bollhöfer 2 A.F. Martín E.S. Quintana-Ortí Deparment of Computer Science and Engineering, Univ.
More information1 Overview. 2 Adapting to computing system evolution. 11 th European LS-DYNA Conference 2017, Salzburg, Austria
1 Overview Improving LSTC s Multifrontal Linear Solver Roger Grimes 3, Robert Lucas 3, Nick Meng 2, Francois-Henry Rouet 3, Clement Weisbecker 3, and Ting-Ting Zhu 1 1 Cray Incorporated 2 Intel Corporation
More informationc 2017 Society for Industrial and Applied Mathematics
SIAM J. SCI. COMPUT. Vol. 39, No. 4, pp. A1710 A1740 c 2017 Society for Industrial and Applied Mathematics ON THE COMPLEXITY OF THE BLOCK LOW-RANK MULTIFRONTAL FACTORIZATION PATRICK AMESTOY, ALFREDO BUTTARI,
More informationMultilevel low-rank approximation preconditioners Yousef Saad Department of Computer Science and Engineering University of Minnesota
Multilevel low-rank approximation preconditioners Yousef Saad Department of Computer Science and Engineering University of Minnesota SIAM CSE Boston - March 1, 2013 First: Joint work with Ruipeng Li Work
More informationAn Efficient Solver for Sparse Linear Systems based on Rank-Structured Cholesky Factorization
An Efficient Solver for Sparse Linear Systems based on Rank-Structured Cholesky Factorization David Bindel Department of Computer Science Cornell University 15 March 2016 (TSIMF) Rank-Structured Cholesky
More informationEffective matrix-free preconditioning for the augmented immersed interface method
Effective matrix-free preconditioning for the augmented immersed interface method Jianlin Xia a, Zhilin Li b, Xin Ye a a Department of Mathematics, Purdue University, West Lafayette, IN 47907, USA. E-mail:
More informationImprovements for Implicit Linear Equation Solvers
Improvements for Implicit Linear Equation Solvers Roger Grimes, Bob Lucas, Clement Weisbecker Livermore Software Technology Corporation Abstract Solving large sparse linear systems of equations is often
More informationEFFECTIVE AND ROBUST PRECONDITIONING OF GENERAL SPD MATRICES VIA STRUCTURED INCOMPLETE FACTORIZATION
EFFECVE AND ROBUS PRECONDONNG OF GENERAL SPD MARCES VA SRUCURED NCOMPLEE FACORZAON JANLN XA AND ZXNG XN Abstract For general symmetric positive definite SPD matrices, we present a framework for designing
More informationSOLVING SPARSE LINEAR SYSTEMS OF EQUATIONS. Chao Yang Computational Research Division Lawrence Berkeley National Laboratory Berkeley, CA, USA
1 SOLVING SPARSE LINEAR SYSTEMS OF EQUATIONS Chao Yang Computational Research Division Lawrence Berkeley National Laboratory Berkeley, CA, USA 2 OUTLINE Sparse matrix storage format Basic factorization
More informationParallelization of Multilevel Preconditioners Constructed from Inverse-Based ILUs on Shared-Memory Multiprocessors
Parallelization of Multilevel Preconditioners Constructed from Inverse-Based ILUs on Shared-Memory Multiprocessors J.I. Aliaga 1 M. Bollhöfer 2 A.F. Martín 1 E.S. Quintana-Ortí 1 1 Deparment of Computer
More informationUtilisation de la compression low-rank pour réduire la complexité du solveur PaStiX
Utilisation de la compression low-rank pour réduire la complexité du solveur PaStiX 26 Septembre 2018 - JCAD 2018 - Lyon Grégoire Pichon, Mathieu Faverge, Pierre Ramet, Jean Roman Outline 1. Context 2.
More informationAMS526: Numerical Analysis I (Numerical Linear Algebra for Computational and Data Sciences)
AMS526: Numerical Analysis I (Numerical Linear Algebra for Computational and Data Sciences) Lecture 19: Computing the SVD; Sparse Linear Systems Xiangmin Jiao Stony Brook University Xiangmin Jiao Numerical
More informationLarge Scale Sparse Linear Algebra
Large Scale Sparse Linear Algebra P. Amestoy (INP-N7, IRIT) A. Buttari (CNRS, IRIT) T. Mary (University of Toulouse, IRIT) A. Guermouche (Univ. Bordeaux, LaBRI), J.-Y. L Excellent (INRIA, LIP, ENS-Lyon)
More informationA fast randomized algorithm for computing a Hierarchically Semi-Separable representation of a matrix
A fast randomized algorithm for computing a Hierarchically Semi-Separable representation of a matrix P.G. Martinsson, Department of Applied Mathematics, University of Colorado at Boulder Abstract: Randomized
More informationStatic-scheduling and hybrid-programming in SuperLU DIST on multicore cluster systems
Static-scheduling and hybrid-programming in SuperLU DIST on multicore cluster systems Ichitaro Yamazaki University of Tennessee, Knoxville Xiaoye Sherry Li Lawrence Berkeley National Laboratory MS49: Sparse
More informationSolving PDEs with CUDA Jonathan Cohen
Solving PDEs with CUDA Jonathan Cohen jocohen@nvidia.com NVIDIA Research PDEs (Partial Differential Equations) Big topic Some common strategies Focus on one type of PDE in this talk Poisson Equation Linear
More informationLinear Solvers. Andrew Hazel
Linear Solvers Andrew Hazel Introduction Thus far we have talked about the formulation and discretisation of physical problems...... and stopped when we got to a discrete linear system of equations. Introduction
More informationContents. Preface... xi. Introduction...
Contents Preface... xi Introduction... xv Chapter 1. Computer Architectures... 1 1.1. Different types of parallelism... 1 1.1.1. Overlap, concurrency and parallelism... 1 1.1.2. Temporal and spatial parallelism
More informationINTERCONNECTED HIERARCHICAL RANK-STRUCTURED METHODS FOR DIRECTLY SOLVING AND PRECONDITIONING THE HELMHOLTZ EQUATION
MATHEMATCS OF COMPUTATON Volume 00, Number 0, Pages 000 000 S 0025-5718XX0000-0 NTERCONNECTED HERARCHCAL RANK-STRUCTURED METHODS FOR DRECTLY SOLVNG AND PRECONDTONNG THE HELMHOLTZ EQUATON XAO LU, JANLN
More informationarxiv: v1 [cs.na] 20 Jul 2015
AN EFFICIENT SOLVER FOR SPARSE LINEAR SYSTEMS BASED ON RANK-STRUCTURED CHOLESKY FACTORIZATION JEFFREY N. CHADWICK AND DAVID S. BINDEL arxiv:1507.05593v1 [cs.na] 20 Jul 2015 Abstract. Direct factorization
More informationOn the design of parallel linear solvers for large scale problems
On the design of parallel linear solvers for large scale problems Journée problème de Poisson, IHP, Paris M. Faverge, P. Ramet M. Faverge Assistant Professor Bordeaux INP LaBRI Inria Bordeaux - Sud-Ouest
More informationSparse linear solvers
Sparse linear solvers Laura Grigori ALPINES INRIA and LJLL, UPMC On sabbatical at UC Berkeley March 2015 Plan Sparse linear solvers Sparse matrices and graphs Classes of linear solvers Sparse Cholesky
More informationSUMMARY INTRODUCTION BLOCK LOW-RANK MULTIFRONTAL METHOD
D frequency-domain seismic modeling with a Block Low-Rank algebraic multifrontal direct solver. C. Weisbecker,, P. Amestoy, O. Boiteau, R. Brossier, A. Buttari, J.-Y. L Excellent, S. Operto, J. Virieux
More informationUsing Random Butterfly Transformations to Avoid Pivoting in Sparse Direct Methods
Using Random Butterfly Transformations to Avoid Pivoting in Sparse Direct Methods Marc Baboulin 1, Xiaoye S. Li 2 and François-Henry Rouet 2 1 University of Paris-Sud, Inria Saclay, France 2 Lawrence Berkeley
More informationSparse Linear Algebra: Direct Methods, advanced features
Sparse Linear Algebra: Direct Methods, advanced features P. Amestoy and A. Buttari (INPT(ENSEEIHT)-IRIT) A. Guermouche (Univ. Bordeaux-LaBRI), J.-Y. L Excellent and B. Uçar (INRIA-CNRS/LIP-ENS Lyon) F.-H.
More informationScientific Computing with Case Studies SIAM Press, Lecture Notes for Unit VII Sparse Matrix
Scientific Computing with Case Studies SIAM Press, 2009 http://www.cs.umd.edu/users/oleary/sccswebpage Lecture Notes for Unit VII Sparse Matrix Computations Part 1: Direct Methods Dianne P. O Leary c 2008
More informationSparse factorization using low rank submatrices. Cleve Ashcraft LSTC 2010 MUMPS User Group Meeting April 15-16, 2010 Toulouse, FRANCE
Sparse factorization using low rank submatrices Cleve Ashcraft LSTC cleve@lstc.com 21 MUMPS User Group Meeting April 15-16, 21 Toulouse, FRANCE ftp.lstc.com:outgoing/cleve/mumps1 Ashcraft.pdf 1 LSTC Livermore
More informationSPARSE SOLVERS POISSON EQUATION. Margreet Nool. November 9, 2015 FOR THE. CWI, Multiscale Dynamics
SPARSE SOLVERS FOR THE POISSON EQUATION Margreet Nool CWI, Multiscale Dynamics November 9, 2015 OUTLINE OF THIS TALK 1 FISHPACK, LAPACK, PARDISO 2 SYSTEM OVERVIEW OF CARTESIUS 3 POISSON EQUATION 4 SOLVERS
More informationPreface to the Second Edition. Preface to the First Edition
n page v Preface to the Second Edition Preface to the First Edition xiii xvii 1 Background in Linear Algebra 1 1.1 Matrices................................. 1 1.2 Square Matrices and Eigenvalues....................
More informationScalable Hybrid Programming and Performance for SuperLU Sparse Direct Solver
Scalable Hybrid Programming and Performance for SuperLU Sparse Direct Solver Sherry Li Lawrence Berkeley National Laboratory Piyush Sao Rich Vuduc Georgia Institute of Technology CUG 14, May 4-8, 14, Lugano,
More informationA High-Performance Parallel Hybrid Method for Large Sparse Linear Systems
Outline A High-Performance Parallel Hybrid Method for Large Sparse Linear Systems Azzam Haidar CERFACS, Toulouse joint work with Luc Giraud (N7-IRIT, France) and Layne Watson (Virginia Polytechnic Institute,
More informationFrom Direct to Iterative Substructuring: some Parallel Experiences in 2 and 3D
From Direct to Iterative Substructuring: some Parallel Experiences in 2 and 3D Luc Giraud N7-IRIT, Toulouse MUMPS Day October 24, 2006, ENS-INRIA, Lyon, France Outline 1 General Framework 2 The direct
More informationMUMPS. The MUMPS library: work done during the SOLSTICE project. MUMPS team, Lyon-Grenoble, Toulouse, Bordeaux
The MUMPS library: work done during the SOLSTICE project MUMPS team, Lyon-Grenoble, Toulouse, Bordeaux Sparse Days and ANR SOLSTICE Final Workshop June MUMPS MUMPS Team since beg. of SOLSTICE (2007) Permanent
More informationAccelerating linear algebra computations with hybrid GPU-multicore systems.
Accelerating linear algebra computations with hybrid GPU-multicore systems. Marc Baboulin INRIA/Université Paris-Sud joint work with Jack Dongarra (University of Tennessee and Oak Ridge National Laboratory)
More informationLecture 17: Iterative Methods and Sparse Linear Algebra
Lecture 17: Iterative Methods and Sparse Linear Algebra David Bindel 25 Mar 2014 Logistics HW 3 extended to Wednesday after break HW 4 should come out Monday after break Still need project description
More informationSparse factorizations: Towards optimal complexity and resilience at exascale
Sparse factorizations: Towards optimal complexity and resilience at exascale Xiaoye Sherry Li Lawrence Berkeley National Laboratory Challenges in 21st Century Experimental Mathematical Computation Workshop,
More informationFast direct solvers for elliptic PDEs
Fast direct solvers for elliptic PDEs Gunnar Martinsson The University of Colorado at Boulder Students: Adrianna Gillman (now at Dartmouth) Nathan Halko Sijia Hao Patrick Young (now at GeoEye Inc.) Collaborators:
More informationA dissection solver with kernel detection for unsymmetric matrices in FreeFem++
. p.1/21 11 Dec. 2014, LJLL, Paris FreeFem++ workshop A dissection solver with kernel detection for unsymmetric matrices in FreeFem++ Atsushi Suzuki Atsushi.Suzuki@ann.jussieu.fr Joint work with François-Xavier
More informationPerformance and Scalability of the Block Low-Rank Multifrontal Factorization on Multicore Architectures
Performance and Scalability of the Block Low-Rank Multifrontal Factorization on Multicore Architectures Patrick R. Amestoy, Alfredo Buttari, Jean-Yves L Excellent, Théo Mary To cite this version: Patrick
More informationLecture 8: Fast Linear Solvers (Part 7)
Lecture 8: Fast Linear Solvers (Part 7) 1 Modified Gram-Schmidt Process with Reorthogonalization Test Reorthogonalization If Av k 2 + δ v k+1 2 = Av k 2 to working precision. δ = 10 3 2 Householder Arnoldi
More informationA direct solver for elliptic PDEs in three dimensions based on hierarchical merging of Poincaré-Steklov operators
(1) A direct solver for elliptic PDEs in three dimensions based on hierarchical merging of Poincaré-Steklov operators S. Hao 1, P.G. Martinsson 2 Abstract: A numerical method for variable coefficient elliptic
More informationNumerical Methods I Non-Square and Sparse Linear Systems
Numerical Methods I Non-Square and Sparse Linear Systems Aleksandar Donev Courant Institute, NYU 1 donev@courant.nyu.edu 1 MATH-GA 2011.003 / CSCI-GA 2945.003, Fall 2014 September 25th, 2014 A. Donev (Courant
More informationSCALABLE HYBRID SPARSE LINEAR SOLVERS
The Pennsylvania State University The Graduate School Department of Computer Science and Engineering SCALABLE HYBRID SPARSE LINEAR SOLVERS A Thesis in Computer Science and Engineering by Keita Teranishi
More informationInformation Sciences Institute 22 June 2012 Bob Lucas, Gene Wagenbreth, Dan Davis, Roger Grimes and
Accelerating the Multifrontal Method Information Sciences Institute 22 June 2012 Bob Lucas, Gene Wagenbreth, Dan Davis, Roger Grimes {rflucas,genew,ddavis}@isi.edu and grimes@lstc.com 3D Finite Element
More informationDirect and Incomplete Cholesky Factorizations with Static Supernodes
Direct and Incomplete Cholesky Factorizations with Static Supernodes AMSC 661 Term Project Report Yuancheng Luo 2010-05-14 Introduction Incomplete factorizations of sparse symmetric positive definite (SSPD)
More informationBridging the gap between flat and hierarchical low-rank matrix formats: the multilevel BLR format
Bridging the gap between flat and hierarchical low-rank matrix formats: the multilevel BLR format Amestoy, Patrick and Buttari, Alfredo and L Excellent, Jean-Yves and Mary, Theo 2018 MIMS EPrint: 2018.12
More informationSolving an Elliptic PDE Eigenvalue Problem via Automated Multi-Level Substructuring and Hierarchical Matrices
Solving an Elliptic PDE Eigenvalue Problem via Automated Multi-Level Substructuring and Hierarchical Matrices Peter Gerds and Lars Grasedyck Bericht Nr. 30 März 2014 Key words: automated multi-level substructuring,
More informationPreconditioning techniques to accelerate the convergence of the iterative solution methods
Note Preconditioning techniques to accelerate the convergence of the iterative solution methods Many issues related to iterative solution of linear systems of equations are contradictory: numerical efficiency
More informationA robust multilevel approximate inverse preconditioner for symmetric positive definite matrices
DICEA DEPARTMENT OF CIVIL, ENVIRONMENTAL AND ARCHITECTURAL ENGINEERING PhD SCHOOL CIVIL AND ENVIRONMENTAL ENGINEERING SCIENCES XXX CYCLE A robust multilevel approximate inverse preconditioner for symmetric
More informationV C V L T I 0 C V B 1 V T 0 I. l nk
Multifrontal Method Kailai Xu September 16, 2017 Main observation. Consider the LDL T decomposition of a SPD matrix [ ] [ ] [ ] [ ] B V T L 0 I 0 L T L A = = 1 V T V C V L T I 0 C V B 1 V T, 0 I where
More informationA dissection solver with kernel detection for symmetric finite element matrices on shared memory computers
A dissection solver with kernel detection for symmetric finite element matrices on shared memory computers Atsushi Suzuki, François-Xavier Roux To cite this version: Atsushi Suzuki, François-Xavier Roux.
More informationParallel Numerical Algorithms
Parallel Numerical Algorithms Chapter 6 Matrix Models Section 6.2 Low Rank Approximation Edgar Solomonik Department of Computer Science University of Illinois at Urbana-Champaign CS 554 / CSE 512 Edgar
More informationITERATIVE METHODS FOR SPARSE LINEAR SYSTEMS
ITERATIVE METHODS FOR SPARSE LINEAR SYSTEMS YOUSEF SAAD University of Minnesota PWS PUBLISHING COMPANY I(T)P An International Thomson Publishing Company BOSTON ALBANY BONN CINCINNATI DETROIT LONDON MADRID
More informationFast multipole method
203/0/ page Chapter 4 Fast multipole method Originally designed for particle simulations One of the most well nown way to tae advantage of the low-ran properties Applications include Simulation of physical
More informationHybrid static/dynamic scheduling for already optimized dense matrix factorization. Joint Laboratory for Petascale Computing, INRIA-UIUC
Hybrid static/dynamic scheduling for already optimized dense matrix factorization Simplice Donfack, Laura Grigori, INRIA, France Bill Gropp, Vivek Kale UIUC, USA Joint Laboratory for Petascale Computing,
More informationLecture 2: The Fast Multipole Method
CBMS Conference on Fast Direct Solvers Dartmouth College June 23 June 27, 2014 Lecture 2: The Fast Multipole Method Gunnar Martinsson The University of Colorado at Boulder Research support by: Recall:
More informationENHANCING PERFORMANCE AND ROBUSTNESS OF ILU PRECONDITIONERS BY BLOCKING AND SELECTIVE TRANSPOSITION
SIAM J. SCI. COMPUT. Vol. 39, No., pp. A303 A332 c 207 Society for Industrial and Applied Mathematics ENHANCING PERFORMANCE AND ROBUSTNESS OF ILU PRECONDITIONERS BY BLOCKING AND SELECTIVE TRANSPOSITION
More informationFAST SPARSE SELECTED INVERSION
SIAM J. MARIX ANAL. APPL. Vol. 36, No. 3, pp. 83 34 c 05 Society for Industrial and Applied Mathematics FAS SPARSE SELECED INVERSION JIANLIN XIA, YUANZHE XI, SEPHEN CAULEY, AND VENKAARAMANAN BALAKRISHNAN
More informationIndefinite and physics-based preconditioning
Indefinite and physics-based preconditioning Jed Brown VAW, ETH Zürich 2009-01-29 Newton iteration Standard form of a nonlinear system F (u) 0 Iteration Solve: Update: J(ũ)u F (ũ) ũ + ũ + u Example (p-bratu)
More informationBLAS: Basic Linear Algebra Subroutines Analysis of the Matrix-Vector-Product Analysis of Matrix-Matrix Product
Level-1 BLAS: SAXPY BLAS-Notation: S single precision (D for double, C for complex) A α scalar X vector P plus operation Y vector SAXPY: y = αx + y Vectorization of SAXPY (αx + y) by pipelining: page 8
More informationSome Geometric and Algebraic Aspects of Domain Decomposition Methods
Some Geometric and Algebraic Aspects of Domain Decomposition Methods D.S.Butyugin 1, Y.L.Gurieva 1, V.P.Ilin 1,2, and D.V.Perevozkin 1 Abstract Some geometric and algebraic aspects of various domain decomposition
More informationExploiting off-diagonal rank structures in the solution of linear matrix equations
Stefano Massei Exploiting off-diagonal rank structures in the solution of linear matrix equations Based on joint works with D. Kressner (EPFL), M. Mazza (IPP of Munich), D. Palitta (IDCTS of Magdeburg)
More informationSpectrum-Revealing Matrix Factorizations Theory and Algorithms
Spectrum-Revealing Matrix Factorizations Theory and Algorithms Ming Gu Department of Mathematics University of California, Berkeley April 5, 2016 Joint work with D. Anderson, J. Deursch, C. Melgaard, J.
More informationIntroduction to communication avoiding algorithms for direct methods of factorization in Linear Algebra
Introduction to communication avoiding algorithms for direct methods of factorization in Linear Algebra Laura Grigori Abstract Modern, massively parallel computers play a fundamental role in a large and
More informationBalanced Truncation Model Reduction of Large and Sparse Generalized Linear Systems
Balanced Truncation Model Reduction of Large and Sparse Generalized Linear Systems Jos M. Badía 1, Peter Benner 2, Rafael Mayo 1, Enrique S. Quintana-Ortí 1, Gregorio Quintana-Ortí 1, A. Remón 1 1 Depto.
More informationFast numerical methods for solving linear PDEs
Fast numerical methods for solving linear PDEs P.G. Martinsson, The University of Colorado at Boulder Acknowledgements: Some of the work presented is joint work with Vladimir Rokhlin and Mark Tygert at
More informationParallel Numerics, WT 2016/ Iterative Methods for Sparse Linear Systems of Equations. page 1 of 1
Parallel Numerics, WT 2016/2017 5 Iterative Methods for Sparse Linear Systems of Equations page 1 of 1 Contents 1 Introduction 1.1 Computer Science Aspects 1.2 Numerical Problems 1.3 Graphs 1.4 Loop Manipulations
More informationFast Structured Spectral Methods
Spectral methods HSS structures Fast algorithms Conclusion Fast Structured Spectral Methods Yingwei Wang Department of Mathematics, Purdue University Joint work with Prof Jie Shen and Prof Jianlin Xia
More informationTowards a highly-parallel PDE-Solver using Adaptive Sparse Grids on Compute Clusters
Towards a highly-parallel PDE-Solver using Adaptive Sparse Grids on Compute Clusters HIM - Workshop on Sparse Grids and Applications Alexander Heinecke Chair of Scientific Computing May 18 th 2011 HIM
More informationAn H-LU Based Direct Finite Element Solver Accelerated by Nested Dissection for Large-scale Modeling of ICs and Packages
PIERS ONLINE, VOL. 6, NO. 7, 2010 679 An H-LU Based Direct Finite Element Solver Accelerated by Nested Dissection for Large-scale Modeling of ICs and Packages Haixin Liu and Dan Jiao School of Electrical
More informationMath 671: Tensor Train decomposition methods
Math 671: Eduardo Corona 1 1 University of Michigan at Ann Arbor December 8, 2016 Table of Contents 1 Preliminaries and goal 2 Unfolding matrices for tensorized arrays The Tensor Train decomposition 3
More informationA Fast Direct Solver for a Class of Elliptic Partial Differential Equations
J Sci Comput (2009) 38: 316 330 DOI 101007/s10915-008-9240-6 A Fast Direct Solver for a Class of Elliptic Partial Differential Equations Per-Gunnar Martinsson Received: 20 September 2007 / Revised: 30
More informationLU factorization with Panel Rank Revealing Pivoting and its Communication Avoiding version
1 LU factorization with Panel Rank Revealing Pivoting and its Communication Avoiding version Amal Khabou Advisor: Laura Grigori Université Paris Sud 11, INRIA Saclay France SIAMPP12 February 17, 2012 2
More informationBlock-tridiagonal matrices
Block-tridiagonal matrices. p.1/31 Block-tridiagonal matrices - where do these arise? - as a result of a particular mesh-point ordering - as a part of a factorization procedure, for example when we compute
More informationDirect solution methods for sparse matrices. p. 1/49
Direct solution methods for sparse matrices p. 1/49 p. 2/49 Direct solution methods for sparse matrices Solve Ax = b, where A(n n). (1) Factorize A = LU, L lower-triangular, U upper-triangular. (2) Solve
More informationIterative methods, preconditioning, and their application to CMB data analysis. Laura Grigori INRIA Saclay
Iterative methods, preconditioning, and their application to CMB data analysis Laura Grigori INRIA Saclay Plan Motivation Communication avoiding for numerical linear algebra Novel algorithms that minimize
More informationScaling and Pivoting in an Out-of-Core Sparse Direct Solver
Scaling and Pivoting in an Out-of-Core Sparse Direct Solver JENNIFER A. SCOTT Rutherford Appleton Laboratory Out-of-core sparse direct solvers reduce the amount of main memory needed to factorize and solve
More informationIncomplete LU Preconditioning and Error Compensation Strategies for Sparse Matrices
Incomplete LU Preconditioning and Error Compensation Strategies for Sparse Matrices Eun-Joo Lee Department of Computer Science, East Stroudsburg University of Pennsylvania, 327 Science and Technology Center,
More informationFINE-GRAINED PARALLEL INCOMPLETE LU FACTORIZATION
FINE-GRAINED PARALLEL INCOMPLETE LU FACTORIZATION EDMOND CHOW AND AFTAB PATEL Abstract. This paper presents a new fine-grained parallel algorithm for computing an incomplete LU factorization. All nonzeros
More informationAlgebraic Multigrid as Solvers and as Preconditioner
Ò Algebraic Multigrid as Solvers and as Preconditioner Domenico Lahaye domenico.lahaye@cs.kuleuven.ac.be http://www.cs.kuleuven.ac.be/ domenico/ Department of Computer Science Katholieke Universiteit Leuven
More informationSparse LU Factorization on GPUs for Accelerating SPICE Simulation
Nano-scale Integrated Circuit and System (NICS) Laboratory Sparse LU Factorization on GPUs for Accelerating SPICE Simulation Xiaoming Chen PhD Candidate Department of Electronic Engineering Tsinghua University,
More informationFine-grained Parallel Incomplete LU Factorization
Fine-grained Parallel Incomplete LU Factorization Edmond Chow School of Computational Science and Engineering Georgia Institute of Technology Sparse Days Meeting at CERFACS June 5-6, 2014 Contribution
More informationKasetsart University Workshop. Multigrid methods: An introduction
Kasetsart University Workshop Multigrid methods: An introduction Dr. Anand Pardhanani Mathematics Department Earlham College Richmond, Indiana USA pardhan@earlham.edu A copy of these slides is available
More informationA Numerical Study of Some Parallel Algebraic Preconditioners
A Numerical Study of Some Parallel Algebraic Preconditioners Xing Cai Simula Research Laboratory & University of Oslo PO Box 1080, Blindern, 0316 Oslo, Norway xingca@simulano Masha Sosonkina University
More informationMultilevel Low-Rank Preconditioners Yousef Saad Department of Computer Science and Engineering University of Minnesota. Modelling 2014 June 2,2014
Multilevel Low-Rank Preconditioners Yousef Saad Department of Computer Science and Engineering University of Minnesota Modelling 24 June 2,24 Dedicated to Owe Axelsson at the occasion of his 8th birthday
More informationExploiting Low-Rank Structure in Computing Matrix Powers with Applications to Preconditioning
Exploiting Low-Rank Structure in Computing Matrix Powers with Applications to Preconditioning Erin C. Carson, Nicholas Knight, James Demmel, Ming Gu U.C. Berkeley SIAM PP 12, Savannah, Georgia, USA, February
More informationSolving PDEs with Multigrid Methods p.1
Solving PDEs with Multigrid Methods Scott MacLachlan maclachl@colorado.edu Department of Applied Mathematics, University of Colorado at Boulder Solving PDEs with Multigrid Methods p.1 Support and Collaboration
More information