A sparse multifrontal solver using hierarchically semi-separable frontal matrices

Size: px
Start display at page:

Download "A sparse multifrontal solver using hierarchically semi-separable frontal matrices"

Transcription

1 A sparse multifrontal solver using hierarchically semi-separable frontal matrices Pieter Ghysels Lawrence Berkeley National Laboratory Joint work with: Xiaoye S. Li (LBNL), Artem Napov (ULB), François-Henry Rouet (LBNL) 2014 CBMS-NSF Conference: Fast direct solvers for elliptic PDEs June 23 29, 2014, Dartmouth College

2 Introduction Consider solving Ax = b or M 1 A = M 1 b with a preconditioned iterative method (CG, GMRES, etc.) Fast convergence if good preconditioner M A is available 1. Cheap to construct, store, apply, parallelize 2. Good approximation of A Contradictory goals trade-off Standard design strategy Start with a direct factorization (like multifrontal LU) Add approximations to make it cheaper (cf. 1) while (hopefully/provably) affecting little (2) Approximation idea: use low-rank approximation

3 he multifrontal method [Duff & Reid 83] 5 Nested dissection reordering defines an elimination tree - SCOCH graph partitioner Bottom-up traversal of the e-tree At each frontal matrix, partial factorization and computation of a contribution block (Schur complement) Parent nodes sum the contribution blocks of their children: extend-add Elimination tree

4 Low-rank property In many applications, frontal matrices exhibit some low-rank blocks ( data sparsity ) Compression with SVD, Rank-Revealing QR,... - SVD: optimal but too expensive - RRQR: QR with column pivoting Exact compression or with a compression threshold ε Recursively, diagonal blocks have low-rank subblocks too - What to do with off-diagonal blocks?

5 Low-rank representations Most low-rank representations belong to the class of H-matrices [Bebendof, Börm, Hackbush, Grasedyck,... ]. Embedded in both dense and sparse solvers: U 1 V 1 U 2 U 3 = U1 0 U3 small 0 U2 Hierarchically Off-Diag. Low Rank (HODLR) [Ambikasaran, Cecka, Darve... ] U 6 B 36 V 3 Hierarchically Semiseparable (HSS) [Chandrasekaran, Dewilde, Gu, Li, Xia,... ] Nested basis. Block-low rank (BLR) [Amestoy et al.] Simple 2D partitioning. Recently in MUMPS. Also: H 2 (includes HSS and FMM), SSS... Choice: simple representations apply to broad classes of problems but provide less gains in memory/operations than specialized/complex ones.

6 HSS/HBS representation he structure is represented by a tree: 7 B 3,6 U 1 U 3, V 3 3 B 6,3 6 U 6, V 6 V 1 U 2 U 3 = U1 0 U3 small 0 U 2 B 1,2 B B 5,4 5 B 4,5 U 1, V 1 2,1 U 2, V 2 U 4, V 4 U 5, V 5 D 1 D 2 D 4 D 5 U 6 B 36 V 3 Number of leaves depends on the problem (geometry) and number of processors to be used. Building the HSS structure and all the usual operations (multiplying... ) consist of traversals of the tree.

7 Embedding low-rank techniques in a multifrontal solver 1. Choose which frontal matrices are compressed (size, level... ) 2. Low-rankness: weak interactions between distant variables = need suitable ordering/clustering of each frontal matrix Geometric setting (3D grid): 2D plane separator Need clusters with small diameters Hierarchical formats, merged clusters need small diameter too Split domain into squares and order with Morton ordering Algebraic: add some kind of halo to (complete) graph of separator variables and call a graph partitioner (MEIS) [Amestoy et al., Napov]

8 Embedding HSS kernels in a multifrontal solver More complicated HSS for frontal matrices: Fully structured: HSS on the whole frontal matrix. No dense matrix. Partial+: HSS on the whole frontal matrix. Dense frontal matrix. Partially structured: HSS on the F 11, F 12 and F 21 parts only. Dense frontal matrix, dense CB = F 22 F 21 F11 1 F 12 in stack. More memory F 11 F 21 F 12 F 22 Partially structured can do regular extend-add In partially structured, HSS compression of dense matrix After HSS compression, ULV factorization of F 11 block - Compared to classical LU in dense case Low rank Schur complement update

9 HSS compression via randomized sampling [Martinsson 11, Xia 13] HSS compression of a matrix A. Ingredients: R r and R c random matrices with d columns d = r + p with r estimated max rank; p = 10 in practice S r = AR r and S c = A R c samples of matrix A Can benefit from a fast matvec Interpolative Decomposition: A = A(:, J) X A is linear combination of selected columns of A wo sided ID: S c = S c (:, J c )X c and S r = S r (:, J r )X r, A = X c A(I c, I r )X r

10 HSS compression via randomized sampling 2 Algorithm (symmetric): from fine to coarse do Leaf node τ: 1. Sample: S loc = S(I τ, :) D R(I τ, :) 2. ID: S loc = U τ S loc (J τ, :) 3. Update: S τ = S loc (J τ, :) R τ = U τ R(I τ, :) I τ = I τ (J τ, :) Inner node τ with children [ ν 1, ν 2 : ] Sν1 A(I 1. Sample: S loc = ν1, I ν2 ) R ν2 S ν2 A(I ν2, I ν1 ) R ν1 2. ID: S loc = U τ S loc (J τ, :) 3. Update: S τ = S loc (J τ, :) R τ I τ = Uτ [R ν1 ; R ν2 ] = [I ν1 I ν2 ](J τ, :)

11 HSS compression via randomized sampling 3 If A A, do this for columns as well (simultaneously) [ ] I Bases have special structure: U τ = Π τ E τ Extract elements from frontal matrix: D τ = A(I τ, I τ ) and B ν1,ν 2 = A(I ν1, I ν2 ) Frontal matrix is combination of separator and HSS children Extracting element from HSS matrix requires traversing the HSS tree and multiplying basis matrices Limiting number of tree traversals is crucial for performance Benefits: Extend-add operation is simplified: only on random vectors Gains in complexity: O(r 2 N log N) iso O(rN 2 ) for non-randomized algorithm. log N due to extracting elements from HSS matrix

12 Randomized sampling extend-add Assembly in regular multifrontal: F p = A p CBc1 CBc2. Sample: S p = F p R p = (A p CBc1 CBc2 )R p = A p R p Y c1 Y c2 1D extend-add (only along rows); much simpler Y c1 and Y c2 samples of CB of children. R p = R c1 R c2 (+random rows for missing indices) Stages: Build random vectors from random vectors of children Build sample from samples of CB of children Multiply separator part of frontal matrix with random vectors: A p R p Compression of F p using S p and R p

13 HSS ULV factorization Exploit structure of U τ (from ID) to introduce zero s [ ] [ ] [ ] I Eτ I U τ = Π τ, Ω E τ = Π 0 τ I 0 τ Ω τ U τ = I D 1 U 1 B 2;1 V 2 U 2 B 1;2 V 1 D 2

14 HSS ULV factorization Exploit structure of U τ (from ID) to introduce zero s [ ] [ ] [ ] I Eτ I U τ = Π τ, Ω E τ = Π 0 τ I 0 τ Ω τ U τ = I [ ] [ ] [ ] Ω1 D 1 U 1B 1,2V2 W1 B = 1,2V Ω 2 U 2B 2,1V1 2 D 2 B 2,1V1 W 2 D 1 U 1 B 2;1 V 2 W 1;1 B 2;1 V 2 W 1;2 U 2 B 1;2 V 1 D 2 B 1;2 V 1 W 2;1 W 2;2

15 HSS ULV factorization Exploit structure of U τ (from ID) to introduce zero s [ ] [ ] [ ] I Eτ I U τ = Π τ, Ω E τ = Π 0 τ I 0 τ Ω τ U τ = I [ ] [ ] [ ] Ω1 D 1 U 1B 1,2V2 W1 B = 1,2V Ω 2 U 2B 2,1V1 2 D 2 B 2,1V1 W 2 ake (full) LQ decomposition [[ ] ] L 1 Lτ 0 Q W τ = τ [W 1;2 Q1 ] [B 1,2V2 Q 2 ] [ ] Q 1 W τ;2 L 2 Q 2 [B 2,1 V1 Q 1 ] [W 2;2Q2 ] D 1 U 1 B 2;1 V 2 W 1;1 B 2;1 V 2 L 1 B 2;1 V 2 Q 2 W 1;2 W 1;2,1 U 2 B 1;2 V 1 D 2 B 1;2 V 1 W 2;1 B 1;2 V 1 Q 1 L 2 W 2;2 W 2;2,1

16 HSS ULV factorization Exploit structure of U τ (from ID) to introduce zero s [ ] [ ] [ ] I Eτ I U τ = Π τ, Ω E τ = Π 0 τ I 0 τ Ω τ U τ = I [ ] [ ] [ ] Ω1 D 1 U 1B 1,2V2 W1 B = 1,2V Ω 2 U 2B 2,1V1 2 D 2 B 2,1V1 W 2 ake (full) LQ decomposition [[ ] ] L 1 Lτ 0 Q W τ = τ [W 1;2 Q1 ] [B 1,2V2 Q 2 ] [ ] Q 1 W τ;2 L 2 Q 2 [B 2,1 V1 Q 1 ] [W 2;2Q2 ] Rows for L τ can be eliminated, others are passed to parent At root node: LU solve of reduced D L 1 B 2;1 V 2 Q 2 W 1;2,1 B 1;2 V 1 Q 1 L 2 W 2;2,1

17 HSS ULV factorization Exploit structure of U τ (from ID) to introduce zero s [ ] [ ] [ ] I Eτ I U τ = Π τ, Ω E τ = Π 0 τ I 0 τ Ω τ U τ = I [ ] [ ] [ ] Ω1 D 1 U 1B 1,2V2 W1 B = 1,2V Ω 2 U 2B 2,1V1 2 D 2 B 2,1V1 W 2 ake (full) LQ decomposition [[ ] ] L 1 Lτ 0 Q W τ = τ [W 1;2 Q1 ] [B 1,2V2 Q 2 ] [ ] Q 1 W τ;2 L 2 Q 2 [B 2,1 V1 Q 1 ] [W 2;2Q2 ] Rows for L τ can be eliminated, others are passed to parent At root node: LU solve of reduced D ULV-like: Ω τ not orthonormal, forward/backward solve phases

18 Low rank Schur complement update Schur Complement update F 21 {}}{ F 22 F 21 F11 1 F 12 = F 22 U q B qk Vk F 1 11 F 12 {}}{ U k B kq Vq F 1 11 via ULV solve V k F 1 11 U k O(rN 2 ) D k is reduced HSS matrix O(r r) k q F 22 U q B qk (Ṽ k D 1 Ũ k )B kq V q F 11 F12 Ṽk D 1 Ũ k O(r 3 ) F 22 ΨΦ Ψ, Φ O(rN) U q and V q : traverse q subtree Cheap multiply with random vectors F 21 F 22

19 Numerical example general matrices GMRES(30) with right preconditioner Multifrontal with HSS vs ILUP from SuperLU b = (1, 1,... ), x 0 = (0, 0,... ) Stopping criterium: r k 2 / b HSS compression tolerance: 10 6 fill-ratio factor (s) its Matrix descr N rank HSS ILU HSS ILU HSS ILU add32 circuit 4, mchln85ks17 car tire 84, mhd500 plasma 250, poli large economics 15, stomach bio eng. 213, tdr190k accelerator 1, 100, torso3 bio eng. 259, utm5940 tokamak 5, wang4 device 26,

20 Numerical example tdr190k tdr190k: Maxwell equations in the frequency domain GMRES(30) convergence

21 Parallel implementation We have a serial code (StruMF [Napov ]) Some performance issues - Currently generates random vectors for all nodes in e-tree - How to estimate the rank? Currently guess and start over when too small Parallel implementation is a work in progress Distributed memory HSS compression of dense matrix - MPI, BLACS, PBLAS, BLAS, LAPACK Shared memory multifrontal code - OpenMP task parallelism for tree traversal for both elimination tree and HSS tree - Next step is parallel dense algebra

22 Parallel HSS compression MPI code opmost separator of a 3D problem, generated with exact multifrontal method k N 10,000 40,000 90,000 MPI processes / cores Nodes Levels olerance 1e-3 1e-3 1-e3 Non-randomized (s) Randomized (s) Dense LU ScaLAPACK (s) On 1024 cores achieved 5.3Flops/s very good flop balance: min / max = % communication overhead

23 ask based parallel tree traversal OpenMP Postorder (leaf to root) tree traversal: handle siblings in parallel, wait for children void traverse(node* p) { if (p->left) #pragma omp task traverse(p->left); if (p->right) #pragma omp task traverse(p->right); #pragma omp taskwait process(p->data); } Run-time system schedules tasks to cores (work-stealing) Root node will be handled sequentially: scaling bottleneck Nested trees: node of nested dissection tree contains HSS tree

24 time (s) time (s) ask based parallel tree traversal OpenMP HSS Compression of dense frontal matrix thread ID Multifrontal thread ID Blue: E-tree node, Red: HSS compression, Green: ULV-fact Extraction of elements from HSS matrix forms bottleneck Considering other runtime task schedulers Intel BB, StarPU, Quark he Quark scheduler from PLASMA could allow integration of PLASMA parallel (tiled) BLAS/LAPACK

25 Conclusions Developing an algebraic preconditioner for general nonsymmetric matrices (from PDEs) HSS is restricted format, large gains possible for certain applications, not for all Graph partitioning difficulties - Separator not always just a plane/line, not always a single piece - Bad for rank structure Some performance issues need to be addressed Separate distributed and shared memory codes - How to combine in a hybrid MPI+X code? Prepare for next generation NERSC supercomputer - Intel MIC based, > 60 cores

26 Conclusions Developing an algebraic preconditioner for general nonsymmetric matrices (from PDEs) HSS is restricted format, large gains possible for certain applications, not for all Graph partitioning difficulties - Separator not always just a plane/line, not always a single piece - Bad for rank structure Some performance issues need to be addressed Separate distributed and shared memory codes - How to combine in a hybrid MPI+X code? Prepare for next generation NERSC supercomputer - Intel MIC based, > 60 cores hank you! Questions?

Fast algorithms for hierarchically semiseparable matrices

Fast algorithms for hierarchically semiseparable matrices NUMERICAL LINEAR ALGEBRA WITH APPLICATIONS Numer. Linear Algebra Appl. 2010; 17:953 976 Published online 22 December 2009 in Wiley Online Library (wileyonlinelibrary.com)..691 Fast algorithms for hierarchically

More information

MULTI-LAYER HIERARCHICAL STRUCTURES AND FACTORIZATIONS

MULTI-LAYER HIERARCHICAL STRUCTURES AND FACTORIZATIONS MULTI-LAYER HIERARCHICAL STRUCTURES AND FACTORIZATIONS JIANLIN XIA Abstract. We propose multi-layer hierarchically semiseparable MHS structures for the fast factorizations of dense matrices arising from

More information

Enhancing Scalability of Sparse Direct Methods

Enhancing Scalability of Sparse Direct Methods Journal of Physics: Conference Series 78 (007) 0 doi:0.088/7-6596/78//0 Enhancing Scalability of Sparse Direct Methods X.S. Li, J. Demmel, L. Grigori, M. Gu, J. Xia 5, S. Jardin 6, C. Sovinec 7, L.-Q.

More information

Incomplete Cholesky preconditioners that exploit the low-rank property

Incomplete Cholesky preconditioners that exploit the low-rank property anapov@ulb.ac.be ; http://homepages.ulb.ac.be/ anapov/ 1 / 35 Incomplete Cholesky preconditioners that exploit the low-rank property (theory and practice) Artem Napov Service de Métrologie Nucléaire, Université

More information

Block Low-Rank (BLR) approximations to improve multifrontal sparse solvers

Block Low-Rank (BLR) approximations to improve multifrontal sparse solvers Block Low-Rank (BLR) approximations to improve multifrontal sparse solvers Joint work with Patrick Amestoy, Cleve Ashcraft, Olivier Boiteau, Alfredo Buttari and Jean-Yves L Excellent, PhD started on October

More information

A Parallel Geometric Multifrontal Solver Using Hierarchically Semiseparable Structure

A Parallel Geometric Multifrontal Solver Using Hierarchically Semiseparable Structure Page 26 of 46 A Parallel Geometric Multifrontal Solver Using Hierarchically Semiseparable Structure SHEN WANG, Department of Mathematics, Purdue University XIAOYE S. LI, Lawrence Berkeley National Laboratory

More information

A DISTRIBUTED-MEMORY RANDOMIZED STRUCTURED MULTIFRONTAL METHOD FOR SPARSE DIRECT SOLUTIONS

A DISTRIBUTED-MEMORY RANDOMIZED STRUCTURED MULTIFRONTAL METHOD FOR SPARSE DIRECT SOLUTIONS A DISTRIBUTED-MEMORY RANDOMIZED STRUCTURED MULTIFRONTAL METHOD FOR SPARSE DIRECT SOLUTIONS ZIXING XIN, JIANLIN XIA, MAARTEN V. DE HOOP, STEPHEN CAULEY, AND VENKATARAMANAN BALAKRISHNAN Abstract. We design

More information

A DISTRIBUTED-MEMORY RANDOMIZED STRUCTURED MULTIFRONTAL METHOD FOR SPARSE DIRECT SOLUTIONS

A DISTRIBUTED-MEMORY RANDOMIZED STRUCTURED MULTIFRONTAL METHOD FOR SPARSE DIRECT SOLUTIONS SIAM J. SCI. COMPUT. Vol. 39, No. 4, pp. C292 C318 c 2017 Society for Industrial and Applied Mathematics A DISTRIBUTED-MEMORY RANDOMIZED STRUCTURED MULTIFRONTAL METHOD FOR SPARSE DIRECT SOLUTIONS ZIXING

More information

Fast matrix algebra for dense matrices with rank-deficient off-diagonal blocks

Fast matrix algebra for dense matrices with rank-deficient off-diagonal blocks CHAPTER 2 Fast matrix algebra for dense matrices with rank-deficient off-diagonal blocks Chapter summary: The chapter describes techniques for rapidly performing algebraic operations on dense matrices

More information

Partial Left-Looking Structured Multifrontal Factorization & Algorithms for Compressed Sensing. Cinna Julie Wu

Partial Left-Looking Structured Multifrontal Factorization & Algorithms for Compressed Sensing. Cinna Julie Wu Partial Left-Looking Structured Multifrontal Factorization & Algorithms for Compressed Sensing by Cinna Julie Wu A dissertation submitted in partial satisfaction of the requirements for the degree of Doctor

More information

A robust inner-outer HSS preconditioner

A robust inner-outer HSS preconditioner NUMERICAL LINEAR ALGEBRA WIH APPLICAIONS Numer. Linear Algebra Appl. 2011; 00:1 0 [Version: 2002/09/18 v1.02] A robust inner-outer HSS preconditioner Jianlin Xia 1 1 Department of Mathematics, Purdue University,

More information

FAST STRUCTURED EIGENSOLVER FOR DISCRETIZED PARTIAL DIFFERENTIAL OPERATORS ON GENERAL MESHES

FAST STRUCTURED EIGENSOLVER FOR DISCRETIZED PARTIAL DIFFERENTIAL OPERATORS ON GENERAL MESHES Proceedings of the Project Review, Geo-Mathematical Imaging Group Purdue University, West Lafayette IN, Vol. 1 2012 pp. 123-132. FAST STRUCTURED EIGENSOLVER FOR DISCRETIZED PARTIAL DIFFERENTIAL OPERATORS

More information

On the design of parallel linear solvers for large scale problems

On the design of parallel linear solvers for large scale problems On the design of parallel linear solvers for large scale problems ICIAM - August 2015 - Mini-Symposium on Recent advances in matrix computations for extreme-scale computers M. Faverge, X. Lacoste, G. Pichon,

More information

J.I. Aliaga 1 M. Bollhöfer 2 A.F. Martín 1 E.S. Quintana-Ortí 1. March, 2009

J.I. Aliaga 1 M. Bollhöfer 2 A.F. Martín 1 E.S. Quintana-Ortí 1. March, 2009 Parallel Preconditioning of Linear Systems based on ILUPACK for Multithreaded Architectures J.I. Aliaga M. Bollhöfer 2 A.F. Martín E.S. Quintana-Ortí Deparment of Computer Science and Engineering, Univ.

More information

1 Overview. 2 Adapting to computing system evolution. 11 th European LS-DYNA Conference 2017, Salzburg, Austria

1 Overview. 2 Adapting to computing system evolution. 11 th European LS-DYNA Conference 2017, Salzburg, Austria 1 Overview Improving LSTC s Multifrontal Linear Solver Roger Grimes 3, Robert Lucas 3, Nick Meng 2, Francois-Henry Rouet 3, Clement Weisbecker 3, and Ting-Ting Zhu 1 1 Cray Incorporated 2 Intel Corporation

More information

c 2017 Society for Industrial and Applied Mathematics

c 2017 Society for Industrial and Applied Mathematics SIAM J. SCI. COMPUT. Vol. 39, No. 4, pp. A1710 A1740 c 2017 Society for Industrial and Applied Mathematics ON THE COMPLEXITY OF THE BLOCK LOW-RANK MULTIFRONTAL FACTORIZATION PATRICK AMESTOY, ALFREDO BUTTARI,

More information

Multilevel low-rank approximation preconditioners Yousef Saad Department of Computer Science and Engineering University of Minnesota

Multilevel low-rank approximation preconditioners Yousef Saad Department of Computer Science and Engineering University of Minnesota Multilevel low-rank approximation preconditioners Yousef Saad Department of Computer Science and Engineering University of Minnesota SIAM CSE Boston - March 1, 2013 First: Joint work with Ruipeng Li Work

More information

An Efficient Solver for Sparse Linear Systems based on Rank-Structured Cholesky Factorization

An Efficient Solver for Sparse Linear Systems based on Rank-Structured Cholesky Factorization An Efficient Solver for Sparse Linear Systems based on Rank-Structured Cholesky Factorization David Bindel Department of Computer Science Cornell University 15 March 2016 (TSIMF) Rank-Structured Cholesky

More information

Effective matrix-free preconditioning for the augmented immersed interface method

Effective matrix-free preconditioning for the augmented immersed interface method Effective matrix-free preconditioning for the augmented immersed interface method Jianlin Xia a, Zhilin Li b, Xin Ye a a Department of Mathematics, Purdue University, West Lafayette, IN 47907, USA. E-mail:

More information

Improvements for Implicit Linear Equation Solvers

Improvements for Implicit Linear Equation Solvers Improvements for Implicit Linear Equation Solvers Roger Grimes, Bob Lucas, Clement Weisbecker Livermore Software Technology Corporation Abstract Solving large sparse linear systems of equations is often

More information

EFFECTIVE AND ROBUST PRECONDITIONING OF GENERAL SPD MATRICES VIA STRUCTURED INCOMPLETE FACTORIZATION

EFFECTIVE AND ROBUST PRECONDITIONING OF GENERAL SPD MATRICES VIA STRUCTURED INCOMPLETE FACTORIZATION EFFECVE AND ROBUS PRECONDONNG OF GENERAL SPD MARCES VA SRUCURED NCOMPLEE FACORZAON JANLN XA AND ZXNG XN Abstract For general symmetric positive definite SPD matrices, we present a framework for designing

More information

SOLVING SPARSE LINEAR SYSTEMS OF EQUATIONS. Chao Yang Computational Research Division Lawrence Berkeley National Laboratory Berkeley, CA, USA

SOLVING SPARSE LINEAR SYSTEMS OF EQUATIONS. Chao Yang Computational Research Division Lawrence Berkeley National Laboratory Berkeley, CA, USA 1 SOLVING SPARSE LINEAR SYSTEMS OF EQUATIONS Chao Yang Computational Research Division Lawrence Berkeley National Laboratory Berkeley, CA, USA 2 OUTLINE Sparse matrix storage format Basic factorization

More information

Parallelization of Multilevel Preconditioners Constructed from Inverse-Based ILUs on Shared-Memory Multiprocessors

Parallelization of Multilevel Preconditioners Constructed from Inverse-Based ILUs on Shared-Memory Multiprocessors Parallelization of Multilevel Preconditioners Constructed from Inverse-Based ILUs on Shared-Memory Multiprocessors J.I. Aliaga 1 M. Bollhöfer 2 A.F. Martín 1 E.S. Quintana-Ortí 1 1 Deparment of Computer

More information

Utilisation de la compression low-rank pour réduire la complexité du solveur PaStiX

Utilisation de la compression low-rank pour réduire la complexité du solveur PaStiX Utilisation de la compression low-rank pour réduire la complexité du solveur PaStiX 26 Septembre 2018 - JCAD 2018 - Lyon Grégoire Pichon, Mathieu Faverge, Pierre Ramet, Jean Roman Outline 1. Context 2.

More information

AMS526: Numerical Analysis I (Numerical Linear Algebra for Computational and Data Sciences)

AMS526: Numerical Analysis I (Numerical Linear Algebra for Computational and Data Sciences) AMS526: Numerical Analysis I (Numerical Linear Algebra for Computational and Data Sciences) Lecture 19: Computing the SVD; Sparse Linear Systems Xiangmin Jiao Stony Brook University Xiangmin Jiao Numerical

More information

Large Scale Sparse Linear Algebra

Large Scale Sparse Linear Algebra Large Scale Sparse Linear Algebra P. Amestoy (INP-N7, IRIT) A. Buttari (CNRS, IRIT) T. Mary (University of Toulouse, IRIT) A. Guermouche (Univ. Bordeaux, LaBRI), J.-Y. L Excellent (INRIA, LIP, ENS-Lyon)

More information

A fast randomized algorithm for computing a Hierarchically Semi-Separable representation of a matrix

A fast randomized algorithm for computing a Hierarchically Semi-Separable representation of a matrix A fast randomized algorithm for computing a Hierarchically Semi-Separable representation of a matrix P.G. Martinsson, Department of Applied Mathematics, University of Colorado at Boulder Abstract: Randomized

More information

Static-scheduling and hybrid-programming in SuperLU DIST on multicore cluster systems

Static-scheduling and hybrid-programming in SuperLU DIST on multicore cluster systems Static-scheduling and hybrid-programming in SuperLU DIST on multicore cluster systems Ichitaro Yamazaki University of Tennessee, Knoxville Xiaoye Sherry Li Lawrence Berkeley National Laboratory MS49: Sparse

More information

Solving PDEs with CUDA Jonathan Cohen

Solving PDEs with CUDA Jonathan Cohen Solving PDEs with CUDA Jonathan Cohen jocohen@nvidia.com NVIDIA Research PDEs (Partial Differential Equations) Big topic Some common strategies Focus on one type of PDE in this talk Poisson Equation Linear

More information

Linear Solvers. Andrew Hazel

Linear Solvers. Andrew Hazel Linear Solvers Andrew Hazel Introduction Thus far we have talked about the formulation and discretisation of physical problems...... and stopped when we got to a discrete linear system of equations. Introduction

More information

Contents. Preface... xi. Introduction...

Contents. Preface... xi. Introduction... Contents Preface... xi Introduction... xv Chapter 1. Computer Architectures... 1 1.1. Different types of parallelism... 1 1.1.1. Overlap, concurrency and parallelism... 1 1.1.2. Temporal and spatial parallelism

More information

INTERCONNECTED HIERARCHICAL RANK-STRUCTURED METHODS FOR DIRECTLY SOLVING AND PRECONDITIONING THE HELMHOLTZ EQUATION

INTERCONNECTED HIERARCHICAL RANK-STRUCTURED METHODS FOR DIRECTLY SOLVING AND PRECONDITIONING THE HELMHOLTZ EQUATION MATHEMATCS OF COMPUTATON Volume 00, Number 0, Pages 000 000 S 0025-5718XX0000-0 NTERCONNECTED HERARCHCAL RANK-STRUCTURED METHODS FOR DRECTLY SOLVNG AND PRECONDTONNG THE HELMHOLTZ EQUATON XAO LU, JANLN

More information

arxiv: v1 [cs.na] 20 Jul 2015

arxiv: v1 [cs.na] 20 Jul 2015 AN EFFICIENT SOLVER FOR SPARSE LINEAR SYSTEMS BASED ON RANK-STRUCTURED CHOLESKY FACTORIZATION JEFFREY N. CHADWICK AND DAVID S. BINDEL arxiv:1507.05593v1 [cs.na] 20 Jul 2015 Abstract. Direct factorization

More information

On the design of parallel linear solvers for large scale problems

On the design of parallel linear solvers for large scale problems On the design of parallel linear solvers for large scale problems Journée problème de Poisson, IHP, Paris M. Faverge, P. Ramet M. Faverge Assistant Professor Bordeaux INP LaBRI Inria Bordeaux - Sud-Ouest

More information

Sparse linear solvers

Sparse linear solvers Sparse linear solvers Laura Grigori ALPINES INRIA and LJLL, UPMC On sabbatical at UC Berkeley March 2015 Plan Sparse linear solvers Sparse matrices and graphs Classes of linear solvers Sparse Cholesky

More information

SUMMARY INTRODUCTION BLOCK LOW-RANK MULTIFRONTAL METHOD

SUMMARY INTRODUCTION BLOCK LOW-RANK MULTIFRONTAL METHOD D frequency-domain seismic modeling with a Block Low-Rank algebraic multifrontal direct solver. C. Weisbecker,, P. Amestoy, O. Boiteau, R. Brossier, A. Buttari, J.-Y. L Excellent, S. Operto, J. Virieux

More information

Using Random Butterfly Transformations to Avoid Pivoting in Sparse Direct Methods

Using Random Butterfly Transformations to Avoid Pivoting in Sparse Direct Methods Using Random Butterfly Transformations to Avoid Pivoting in Sparse Direct Methods Marc Baboulin 1, Xiaoye S. Li 2 and François-Henry Rouet 2 1 University of Paris-Sud, Inria Saclay, France 2 Lawrence Berkeley

More information

Sparse Linear Algebra: Direct Methods, advanced features

Sparse Linear Algebra: Direct Methods, advanced features Sparse Linear Algebra: Direct Methods, advanced features P. Amestoy and A. Buttari (INPT(ENSEEIHT)-IRIT) A. Guermouche (Univ. Bordeaux-LaBRI), J.-Y. L Excellent and B. Uçar (INRIA-CNRS/LIP-ENS Lyon) F.-H.

More information

Scientific Computing with Case Studies SIAM Press, Lecture Notes for Unit VII Sparse Matrix

Scientific Computing with Case Studies SIAM Press, Lecture Notes for Unit VII Sparse Matrix Scientific Computing with Case Studies SIAM Press, 2009 http://www.cs.umd.edu/users/oleary/sccswebpage Lecture Notes for Unit VII Sparse Matrix Computations Part 1: Direct Methods Dianne P. O Leary c 2008

More information

Sparse factorization using low rank submatrices. Cleve Ashcraft LSTC 2010 MUMPS User Group Meeting April 15-16, 2010 Toulouse, FRANCE

Sparse factorization using low rank submatrices. Cleve Ashcraft LSTC 2010 MUMPS User Group Meeting April 15-16, 2010 Toulouse, FRANCE Sparse factorization using low rank submatrices Cleve Ashcraft LSTC cleve@lstc.com 21 MUMPS User Group Meeting April 15-16, 21 Toulouse, FRANCE ftp.lstc.com:outgoing/cleve/mumps1 Ashcraft.pdf 1 LSTC Livermore

More information

SPARSE SOLVERS POISSON EQUATION. Margreet Nool. November 9, 2015 FOR THE. CWI, Multiscale Dynamics

SPARSE SOLVERS POISSON EQUATION. Margreet Nool. November 9, 2015 FOR THE. CWI, Multiscale Dynamics SPARSE SOLVERS FOR THE POISSON EQUATION Margreet Nool CWI, Multiscale Dynamics November 9, 2015 OUTLINE OF THIS TALK 1 FISHPACK, LAPACK, PARDISO 2 SYSTEM OVERVIEW OF CARTESIUS 3 POISSON EQUATION 4 SOLVERS

More information

Preface to the Second Edition. Preface to the First Edition

Preface to the Second Edition. Preface to the First Edition n page v Preface to the Second Edition Preface to the First Edition xiii xvii 1 Background in Linear Algebra 1 1.1 Matrices................................. 1 1.2 Square Matrices and Eigenvalues....................

More information

Scalable Hybrid Programming and Performance for SuperLU Sparse Direct Solver

Scalable Hybrid Programming and Performance for SuperLU Sparse Direct Solver Scalable Hybrid Programming and Performance for SuperLU Sparse Direct Solver Sherry Li Lawrence Berkeley National Laboratory Piyush Sao Rich Vuduc Georgia Institute of Technology CUG 14, May 4-8, 14, Lugano,

More information

A High-Performance Parallel Hybrid Method for Large Sparse Linear Systems

A High-Performance Parallel Hybrid Method for Large Sparse Linear Systems Outline A High-Performance Parallel Hybrid Method for Large Sparse Linear Systems Azzam Haidar CERFACS, Toulouse joint work with Luc Giraud (N7-IRIT, France) and Layne Watson (Virginia Polytechnic Institute,

More information

From Direct to Iterative Substructuring: some Parallel Experiences in 2 and 3D

From Direct to Iterative Substructuring: some Parallel Experiences in 2 and 3D From Direct to Iterative Substructuring: some Parallel Experiences in 2 and 3D Luc Giraud N7-IRIT, Toulouse MUMPS Day October 24, 2006, ENS-INRIA, Lyon, France Outline 1 General Framework 2 The direct

More information

MUMPS. The MUMPS library: work done during the SOLSTICE project. MUMPS team, Lyon-Grenoble, Toulouse, Bordeaux

MUMPS. The MUMPS library: work done during the SOLSTICE project. MUMPS team, Lyon-Grenoble, Toulouse, Bordeaux The MUMPS library: work done during the SOLSTICE project MUMPS team, Lyon-Grenoble, Toulouse, Bordeaux Sparse Days and ANR SOLSTICE Final Workshop June MUMPS MUMPS Team since beg. of SOLSTICE (2007) Permanent

More information

Accelerating linear algebra computations with hybrid GPU-multicore systems.

Accelerating linear algebra computations with hybrid GPU-multicore systems. Accelerating linear algebra computations with hybrid GPU-multicore systems. Marc Baboulin INRIA/Université Paris-Sud joint work with Jack Dongarra (University of Tennessee and Oak Ridge National Laboratory)

More information

Lecture 17: Iterative Methods and Sparse Linear Algebra

Lecture 17: Iterative Methods and Sparse Linear Algebra Lecture 17: Iterative Methods and Sparse Linear Algebra David Bindel 25 Mar 2014 Logistics HW 3 extended to Wednesday after break HW 4 should come out Monday after break Still need project description

More information

Sparse factorizations: Towards optimal complexity and resilience at exascale

Sparse factorizations: Towards optimal complexity and resilience at exascale Sparse factorizations: Towards optimal complexity and resilience at exascale Xiaoye Sherry Li Lawrence Berkeley National Laboratory Challenges in 21st Century Experimental Mathematical Computation Workshop,

More information

Fast direct solvers for elliptic PDEs

Fast direct solvers for elliptic PDEs Fast direct solvers for elliptic PDEs Gunnar Martinsson The University of Colorado at Boulder Students: Adrianna Gillman (now at Dartmouth) Nathan Halko Sijia Hao Patrick Young (now at GeoEye Inc.) Collaborators:

More information

A dissection solver with kernel detection for unsymmetric matrices in FreeFem++

A dissection solver with kernel detection for unsymmetric matrices in FreeFem++ . p.1/21 11 Dec. 2014, LJLL, Paris FreeFem++ workshop A dissection solver with kernel detection for unsymmetric matrices in FreeFem++ Atsushi Suzuki Atsushi.Suzuki@ann.jussieu.fr Joint work with François-Xavier

More information

Performance and Scalability of the Block Low-Rank Multifrontal Factorization on Multicore Architectures

Performance and Scalability of the Block Low-Rank Multifrontal Factorization on Multicore Architectures Performance and Scalability of the Block Low-Rank Multifrontal Factorization on Multicore Architectures Patrick R. Amestoy, Alfredo Buttari, Jean-Yves L Excellent, Théo Mary To cite this version: Patrick

More information

Lecture 8: Fast Linear Solvers (Part 7)

Lecture 8: Fast Linear Solvers (Part 7) Lecture 8: Fast Linear Solvers (Part 7) 1 Modified Gram-Schmidt Process with Reorthogonalization Test Reorthogonalization If Av k 2 + δ v k+1 2 = Av k 2 to working precision. δ = 10 3 2 Householder Arnoldi

More information

A direct solver for elliptic PDEs in three dimensions based on hierarchical merging of Poincaré-Steklov operators

A direct solver for elliptic PDEs in three dimensions based on hierarchical merging of Poincaré-Steklov operators (1) A direct solver for elliptic PDEs in three dimensions based on hierarchical merging of Poincaré-Steklov operators S. Hao 1, P.G. Martinsson 2 Abstract: A numerical method for variable coefficient elliptic

More information

Numerical Methods I Non-Square and Sparse Linear Systems

Numerical Methods I Non-Square and Sparse Linear Systems Numerical Methods I Non-Square and Sparse Linear Systems Aleksandar Donev Courant Institute, NYU 1 donev@courant.nyu.edu 1 MATH-GA 2011.003 / CSCI-GA 2945.003, Fall 2014 September 25th, 2014 A. Donev (Courant

More information

SCALABLE HYBRID SPARSE LINEAR SOLVERS

SCALABLE HYBRID SPARSE LINEAR SOLVERS The Pennsylvania State University The Graduate School Department of Computer Science and Engineering SCALABLE HYBRID SPARSE LINEAR SOLVERS A Thesis in Computer Science and Engineering by Keita Teranishi

More information

Information Sciences Institute 22 June 2012 Bob Lucas, Gene Wagenbreth, Dan Davis, Roger Grimes and

Information Sciences Institute 22 June 2012 Bob Lucas, Gene Wagenbreth, Dan Davis, Roger Grimes and Accelerating the Multifrontal Method Information Sciences Institute 22 June 2012 Bob Lucas, Gene Wagenbreth, Dan Davis, Roger Grimes {rflucas,genew,ddavis}@isi.edu and grimes@lstc.com 3D Finite Element

More information

Direct and Incomplete Cholesky Factorizations with Static Supernodes

Direct and Incomplete Cholesky Factorizations with Static Supernodes Direct and Incomplete Cholesky Factorizations with Static Supernodes AMSC 661 Term Project Report Yuancheng Luo 2010-05-14 Introduction Incomplete factorizations of sparse symmetric positive definite (SSPD)

More information

Bridging the gap between flat and hierarchical low-rank matrix formats: the multilevel BLR format

Bridging the gap between flat and hierarchical low-rank matrix formats: the multilevel BLR format Bridging the gap between flat and hierarchical low-rank matrix formats: the multilevel BLR format Amestoy, Patrick and Buttari, Alfredo and L Excellent, Jean-Yves and Mary, Theo 2018 MIMS EPrint: 2018.12

More information

Solving an Elliptic PDE Eigenvalue Problem via Automated Multi-Level Substructuring and Hierarchical Matrices

Solving an Elliptic PDE Eigenvalue Problem via Automated Multi-Level Substructuring and Hierarchical Matrices Solving an Elliptic PDE Eigenvalue Problem via Automated Multi-Level Substructuring and Hierarchical Matrices Peter Gerds and Lars Grasedyck Bericht Nr. 30 März 2014 Key words: automated multi-level substructuring,

More information

Preconditioning techniques to accelerate the convergence of the iterative solution methods

Preconditioning techniques to accelerate the convergence of the iterative solution methods Note Preconditioning techniques to accelerate the convergence of the iterative solution methods Many issues related to iterative solution of linear systems of equations are contradictory: numerical efficiency

More information

A robust multilevel approximate inverse preconditioner for symmetric positive definite matrices

A robust multilevel approximate inverse preconditioner for symmetric positive definite matrices DICEA DEPARTMENT OF CIVIL, ENVIRONMENTAL AND ARCHITECTURAL ENGINEERING PhD SCHOOL CIVIL AND ENVIRONMENTAL ENGINEERING SCIENCES XXX CYCLE A robust multilevel approximate inverse preconditioner for symmetric

More information

V C V L T I 0 C V B 1 V T 0 I. l nk

V C V L T I 0 C V B 1 V T 0 I. l nk Multifrontal Method Kailai Xu September 16, 2017 Main observation. Consider the LDL T decomposition of a SPD matrix [ ] [ ] [ ] [ ] B V T L 0 I 0 L T L A = = 1 V T V C V L T I 0 C V B 1 V T, 0 I where

More information

A dissection solver with kernel detection for symmetric finite element matrices on shared memory computers

A dissection solver with kernel detection for symmetric finite element matrices on shared memory computers A dissection solver with kernel detection for symmetric finite element matrices on shared memory computers Atsushi Suzuki, François-Xavier Roux To cite this version: Atsushi Suzuki, François-Xavier Roux.

More information

Parallel Numerical Algorithms

Parallel Numerical Algorithms Parallel Numerical Algorithms Chapter 6 Matrix Models Section 6.2 Low Rank Approximation Edgar Solomonik Department of Computer Science University of Illinois at Urbana-Champaign CS 554 / CSE 512 Edgar

More information

ITERATIVE METHODS FOR SPARSE LINEAR SYSTEMS

ITERATIVE METHODS FOR SPARSE LINEAR SYSTEMS ITERATIVE METHODS FOR SPARSE LINEAR SYSTEMS YOUSEF SAAD University of Minnesota PWS PUBLISHING COMPANY I(T)P An International Thomson Publishing Company BOSTON ALBANY BONN CINCINNATI DETROIT LONDON MADRID

More information

Fast multipole method

Fast multipole method 203/0/ page Chapter 4 Fast multipole method Originally designed for particle simulations One of the most well nown way to tae advantage of the low-ran properties Applications include Simulation of physical

More information

Hybrid static/dynamic scheduling for already optimized dense matrix factorization. Joint Laboratory for Petascale Computing, INRIA-UIUC

Hybrid static/dynamic scheduling for already optimized dense matrix factorization. Joint Laboratory for Petascale Computing, INRIA-UIUC Hybrid static/dynamic scheduling for already optimized dense matrix factorization Simplice Donfack, Laura Grigori, INRIA, France Bill Gropp, Vivek Kale UIUC, USA Joint Laboratory for Petascale Computing,

More information

Lecture 2: The Fast Multipole Method

Lecture 2: The Fast Multipole Method CBMS Conference on Fast Direct Solvers Dartmouth College June 23 June 27, 2014 Lecture 2: The Fast Multipole Method Gunnar Martinsson The University of Colorado at Boulder Research support by: Recall:

More information

ENHANCING PERFORMANCE AND ROBUSTNESS OF ILU PRECONDITIONERS BY BLOCKING AND SELECTIVE TRANSPOSITION

ENHANCING PERFORMANCE AND ROBUSTNESS OF ILU PRECONDITIONERS BY BLOCKING AND SELECTIVE TRANSPOSITION SIAM J. SCI. COMPUT. Vol. 39, No., pp. A303 A332 c 207 Society for Industrial and Applied Mathematics ENHANCING PERFORMANCE AND ROBUSTNESS OF ILU PRECONDITIONERS BY BLOCKING AND SELECTIVE TRANSPOSITION

More information

FAST SPARSE SELECTED INVERSION

FAST SPARSE SELECTED INVERSION SIAM J. MARIX ANAL. APPL. Vol. 36, No. 3, pp. 83 34 c 05 Society for Industrial and Applied Mathematics FAS SPARSE SELECED INVERSION JIANLIN XIA, YUANZHE XI, SEPHEN CAULEY, AND VENKAARAMANAN BALAKRISHNAN

More information

Indefinite and physics-based preconditioning

Indefinite and physics-based preconditioning Indefinite and physics-based preconditioning Jed Brown VAW, ETH Zürich 2009-01-29 Newton iteration Standard form of a nonlinear system F (u) 0 Iteration Solve: Update: J(ũ)u F (ũ) ũ + ũ + u Example (p-bratu)

More information

BLAS: Basic Linear Algebra Subroutines Analysis of the Matrix-Vector-Product Analysis of Matrix-Matrix Product

BLAS: Basic Linear Algebra Subroutines Analysis of the Matrix-Vector-Product Analysis of Matrix-Matrix Product Level-1 BLAS: SAXPY BLAS-Notation: S single precision (D for double, C for complex) A α scalar X vector P plus operation Y vector SAXPY: y = αx + y Vectorization of SAXPY (αx + y) by pipelining: page 8

More information

Some Geometric and Algebraic Aspects of Domain Decomposition Methods

Some Geometric and Algebraic Aspects of Domain Decomposition Methods Some Geometric and Algebraic Aspects of Domain Decomposition Methods D.S.Butyugin 1, Y.L.Gurieva 1, V.P.Ilin 1,2, and D.V.Perevozkin 1 Abstract Some geometric and algebraic aspects of various domain decomposition

More information

Exploiting off-diagonal rank structures in the solution of linear matrix equations

Exploiting off-diagonal rank structures in the solution of linear matrix equations Stefano Massei Exploiting off-diagonal rank structures in the solution of linear matrix equations Based on joint works with D. Kressner (EPFL), M. Mazza (IPP of Munich), D. Palitta (IDCTS of Magdeburg)

More information

Spectrum-Revealing Matrix Factorizations Theory and Algorithms

Spectrum-Revealing Matrix Factorizations Theory and Algorithms Spectrum-Revealing Matrix Factorizations Theory and Algorithms Ming Gu Department of Mathematics University of California, Berkeley April 5, 2016 Joint work with D. Anderson, J. Deursch, C. Melgaard, J.

More information

Introduction to communication avoiding algorithms for direct methods of factorization in Linear Algebra

Introduction to communication avoiding algorithms for direct methods of factorization in Linear Algebra Introduction to communication avoiding algorithms for direct methods of factorization in Linear Algebra Laura Grigori Abstract Modern, massively parallel computers play a fundamental role in a large and

More information

Balanced Truncation Model Reduction of Large and Sparse Generalized Linear Systems

Balanced Truncation Model Reduction of Large and Sparse Generalized Linear Systems Balanced Truncation Model Reduction of Large and Sparse Generalized Linear Systems Jos M. Badía 1, Peter Benner 2, Rafael Mayo 1, Enrique S. Quintana-Ortí 1, Gregorio Quintana-Ortí 1, A. Remón 1 1 Depto.

More information

Fast numerical methods for solving linear PDEs

Fast numerical methods for solving linear PDEs Fast numerical methods for solving linear PDEs P.G. Martinsson, The University of Colorado at Boulder Acknowledgements: Some of the work presented is joint work with Vladimir Rokhlin and Mark Tygert at

More information

Parallel Numerics, WT 2016/ Iterative Methods for Sparse Linear Systems of Equations. page 1 of 1

Parallel Numerics, WT 2016/ Iterative Methods for Sparse Linear Systems of Equations. page 1 of 1 Parallel Numerics, WT 2016/2017 5 Iterative Methods for Sparse Linear Systems of Equations page 1 of 1 Contents 1 Introduction 1.1 Computer Science Aspects 1.2 Numerical Problems 1.3 Graphs 1.4 Loop Manipulations

More information

Fast Structured Spectral Methods

Fast Structured Spectral Methods Spectral methods HSS structures Fast algorithms Conclusion Fast Structured Spectral Methods Yingwei Wang Department of Mathematics, Purdue University Joint work with Prof Jie Shen and Prof Jianlin Xia

More information

Towards a highly-parallel PDE-Solver using Adaptive Sparse Grids on Compute Clusters

Towards a highly-parallel PDE-Solver using Adaptive Sparse Grids on Compute Clusters Towards a highly-parallel PDE-Solver using Adaptive Sparse Grids on Compute Clusters HIM - Workshop on Sparse Grids and Applications Alexander Heinecke Chair of Scientific Computing May 18 th 2011 HIM

More information

An H-LU Based Direct Finite Element Solver Accelerated by Nested Dissection for Large-scale Modeling of ICs and Packages

An H-LU Based Direct Finite Element Solver Accelerated by Nested Dissection for Large-scale Modeling of ICs and Packages PIERS ONLINE, VOL. 6, NO. 7, 2010 679 An H-LU Based Direct Finite Element Solver Accelerated by Nested Dissection for Large-scale Modeling of ICs and Packages Haixin Liu and Dan Jiao School of Electrical

More information

Math 671: Tensor Train decomposition methods

Math 671: Tensor Train decomposition methods Math 671: Eduardo Corona 1 1 University of Michigan at Ann Arbor December 8, 2016 Table of Contents 1 Preliminaries and goal 2 Unfolding matrices for tensorized arrays The Tensor Train decomposition 3

More information

A Fast Direct Solver for a Class of Elliptic Partial Differential Equations

A Fast Direct Solver for a Class of Elliptic Partial Differential Equations J Sci Comput (2009) 38: 316 330 DOI 101007/s10915-008-9240-6 A Fast Direct Solver for a Class of Elliptic Partial Differential Equations Per-Gunnar Martinsson Received: 20 September 2007 / Revised: 30

More information

LU factorization with Panel Rank Revealing Pivoting and its Communication Avoiding version

LU factorization with Panel Rank Revealing Pivoting and its Communication Avoiding version 1 LU factorization with Panel Rank Revealing Pivoting and its Communication Avoiding version Amal Khabou Advisor: Laura Grigori Université Paris Sud 11, INRIA Saclay France SIAMPP12 February 17, 2012 2

More information

Block-tridiagonal matrices

Block-tridiagonal matrices Block-tridiagonal matrices. p.1/31 Block-tridiagonal matrices - where do these arise? - as a result of a particular mesh-point ordering - as a part of a factorization procedure, for example when we compute

More information

Direct solution methods for sparse matrices. p. 1/49

Direct solution methods for sparse matrices. p. 1/49 Direct solution methods for sparse matrices p. 1/49 p. 2/49 Direct solution methods for sparse matrices Solve Ax = b, where A(n n). (1) Factorize A = LU, L lower-triangular, U upper-triangular. (2) Solve

More information

Iterative methods, preconditioning, and their application to CMB data analysis. Laura Grigori INRIA Saclay

Iterative methods, preconditioning, and their application to CMB data analysis. Laura Grigori INRIA Saclay Iterative methods, preconditioning, and their application to CMB data analysis Laura Grigori INRIA Saclay Plan Motivation Communication avoiding for numerical linear algebra Novel algorithms that minimize

More information

Scaling and Pivoting in an Out-of-Core Sparse Direct Solver

Scaling and Pivoting in an Out-of-Core Sparse Direct Solver Scaling and Pivoting in an Out-of-Core Sparse Direct Solver JENNIFER A. SCOTT Rutherford Appleton Laboratory Out-of-core sparse direct solvers reduce the amount of main memory needed to factorize and solve

More information

Incomplete LU Preconditioning and Error Compensation Strategies for Sparse Matrices

Incomplete LU Preconditioning and Error Compensation Strategies for Sparse Matrices Incomplete LU Preconditioning and Error Compensation Strategies for Sparse Matrices Eun-Joo Lee Department of Computer Science, East Stroudsburg University of Pennsylvania, 327 Science and Technology Center,

More information

FINE-GRAINED PARALLEL INCOMPLETE LU FACTORIZATION

FINE-GRAINED PARALLEL INCOMPLETE LU FACTORIZATION FINE-GRAINED PARALLEL INCOMPLETE LU FACTORIZATION EDMOND CHOW AND AFTAB PATEL Abstract. This paper presents a new fine-grained parallel algorithm for computing an incomplete LU factorization. All nonzeros

More information

Algebraic Multigrid as Solvers and as Preconditioner

Algebraic Multigrid as Solvers and as Preconditioner Ò Algebraic Multigrid as Solvers and as Preconditioner Domenico Lahaye domenico.lahaye@cs.kuleuven.ac.be http://www.cs.kuleuven.ac.be/ domenico/ Department of Computer Science Katholieke Universiteit Leuven

More information

Sparse LU Factorization on GPUs for Accelerating SPICE Simulation

Sparse LU Factorization on GPUs for Accelerating SPICE Simulation Nano-scale Integrated Circuit and System (NICS) Laboratory Sparse LU Factorization on GPUs for Accelerating SPICE Simulation Xiaoming Chen PhD Candidate Department of Electronic Engineering Tsinghua University,

More information

Fine-grained Parallel Incomplete LU Factorization

Fine-grained Parallel Incomplete LU Factorization Fine-grained Parallel Incomplete LU Factorization Edmond Chow School of Computational Science and Engineering Georgia Institute of Technology Sparse Days Meeting at CERFACS June 5-6, 2014 Contribution

More information

Kasetsart University Workshop. Multigrid methods: An introduction

Kasetsart University Workshop. Multigrid methods: An introduction Kasetsart University Workshop Multigrid methods: An introduction Dr. Anand Pardhanani Mathematics Department Earlham College Richmond, Indiana USA pardhan@earlham.edu A copy of these slides is available

More information

A Numerical Study of Some Parallel Algebraic Preconditioners

A Numerical Study of Some Parallel Algebraic Preconditioners A Numerical Study of Some Parallel Algebraic Preconditioners Xing Cai Simula Research Laboratory & University of Oslo PO Box 1080, Blindern, 0316 Oslo, Norway xingca@simulano Masha Sosonkina University

More information

Multilevel Low-Rank Preconditioners Yousef Saad Department of Computer Science and Engineering University of Minnesota. Modelling 2014 June 2,2014

Multilevel Low-Rank Preconditioners Yousef Saad Department of Computer Science and Engineering University of Minnesota. Modelling 2014 June 2,2014 Multilevel Low-Rank Preconditioners Yousef Saad Department of Computer Science and Engineering University of Minnesota Modelling 24 June 2,24 Dedicated to Owe Axelsson at the occasion of his 8th birthday

More information

Exploiting Low-Rank Structure in Computing Matrix Powers with Applications to Preconditioning

Exploiting Low-Rank Structure in Computing Matrix Powers with Applications to Preconditioning Exploiting Low-Rank Structure in Computing Matrix Powers with Applications to Preconditioning Erin C. Carson, Nicholas Knight, James Demmel, Ming Gu U.C. Berkeley SIAM PP 12, Savannah, Georgia, USA, February

More information

Solving PDEs with Multigrid Methods p.1

Solving PDEs with Multigrid Methods p.1 Solving PDEs with Multigrid Methods Scott MacLachlan maclachl@colorado.edu Department of Applied Mathematics, University of Colorado at Boulder Solving PDEs with Multigrid Methods p.1 Support and Collaboration

More information