Parallelization of Multilevel Preconditioners Constructed from Inverse-Based ILUs on Shared-Memory Multiprocessors

Size: px
Start display at page:

Download "Parallelization of Multilevel Preconditioners Constructed from Inverse-Based ILUs on Shared-Memory Multiprocessors"

Transcription

1 Parallelization of Multilevel Preconditioners Constructed from Inverse-Based ILUs on Shared-Memory Multiprocessors J.I. Aliaga 1 M. Bollhöfer 2 A.F. Martín 1 E.S. Quintana-Ortí 1 1 Deparment of Computer Science and Engineering, Univ. Jaume I (Spain) {aliaga,martina,quintana}@icc.uji.es 2 Institute of Computational Mathematics, TU-Braunschweig (Germany) m.bollhoefer@tu-braunschweig.de September, 2007

2 The solution of SPARSE linear systems is ubiquitous: Matrix market collection UF sparse matrix collection matrices/ Two main approaches to attack the problem: Direct methods (sparse LU, sparse Cholesky, etc.) may fail due to excessive fill-in of the factors Iterative methods (PCG, GMRES, etc.) may present slow convergence or even divergence

3 Convergence of iterative methods can be accelerated via preconditioning: Incomplete LU Sparse approximate inverse... Temporal cost of preconditioning can be reduced using high-performance computing and parallel platforms: Distributed-memory: Difficult to program, high cost of communication specially hurts sparse linear algebra Shared-memory: lack of scalability

4 Mid-term goal: Computation of parallel preconditioners on shared-memory architectures These platforms provide enough computational power (16 64 processors) for many practical applications With the advent of manycore, scalability will improve Focus: First results obtained from a parallel preconditioner that employs the same techniques as those of ILUPACK using OpenMP on a CC-NUMA platform

5 Outline Motivation and Introduction 1 Motivation and Introduction 2 Incomplete LU factorization ILUPACK 3 Trees and sparse matrices Parallel algorithm Parallel implementation 4 5

6 Outline Motivation and Introduction Incomplete LU factorization ILUPACK 1 Motivation and Introduction 2 Incomplete LU factorization ILUPACK 3 Trees and sparse matrices Parallel algorithm Parallel implementation 4 5

7 Incomplete LU factorization Incomplete LU factorization ILUPACK Given Ax = b, compute L and U T (sparse) unit lower triangular and D diagonal so that, M = LDU, A = M R, while controlling the fill-in by means of dropping techniques. The application of the iterative method on the preconditioned system, accelerates its convergence. M 1 Ax = M 1 b

8 Incomplete LU factorization ILUPACK Many "succesful" ILU factorization preconditioners: ILU(0), ILU(1)... ILU(p): level-of-fill ILU with parameter p ILUT: ILU with threshold ILUTP: ILUT with partial pivoting ILU with inverse-based dropping Multilevel ILU... ILUPACK library implements multilevel inverse-based ILUs

9 ILUPACK Motivation and Introduction Incomplete LU factorization ILUPACK In each level: Initial static ordering (+ scaling) of the coefficient matrix P T AQ = Ã (MLND, RCM, MMD,... ) Incomplete LU factorization with controlled dropping Also, it uses diagonal pivoting to control the growth of L 1 and U 1 The pivoted part forms the input matrix of the next level

10 Incomplete LU factorization ILUPACK Crout variant of LU factorization with two different criteria for dropping entries in the factors: Small entries Entries with small influence on the growth of L 1, U 1 Diagonal pivoting to control growth of L 1, U 1 Untouched If pivoting is needed... Untouched Also Untouched

11 Incomplete LU factorization ILUPACK If the untouched part only contains pivoted elements... L 11 D 11U 11 U12 Untouched L 21 C Obtain the approximate Schur complement of C S = C L 21 D 11 U 12, applying certain dropping techniques Repeat the procedure to factorize S (multilevel recursion)

12 Incomplete LU factorization ILUPACK A three level inverse-based multilevel ILU example L 11 D 11 U 11 U 12 L22 D 22 U22 U23 L 21 S 1 L32 S 2 L33 D 33 U33 First Level Second Level Third Level

13 Outline Motivation and Introduction Trees and sparse matrices Parallel algorithm Parallel implementation 1 Motivation and Introduction 2 Incomplete LU factorization ILUPACK 3 Trees and sparse matrices Parallel algorithm Parallel implementation 4 5

14 Elimination tree Motivation and Introduction Trees and sparse matrices Parallel algorithm Parallel implementation Matrix (sparsity pattern) Elimination tree The elimination tree shows groups of rows/columns which can be factorized independently as, e.g., {1,2,3} and {4,5,6}

15 Trees and sparse matrices Parallel algorithm Parallel implementation Matrix (sparsity pattern) Elimination tree The elimination tree shows groups of rows/columns which can be factorized independently as, e.g., {1,2,3} and {4,5,6}

16 Task (dependency) tree Trees and sparse matrices Parallel algorithm Parallel implementation Elimination tree Task (dependency) tree T1 T T 2 3 T 4 T 5 T6 T7 Independent subtrees The task tree shows tasks which can be executed concurrently (i.e., in parallel)

17 Trees and sparse matrices Parallel algorithm Parallel implementation The task tree defines a block partition of the matrix. Consider a task tree of height one: T 1 T T 2 3 A = A 11 0 A 13 0 A 22 A 23 A T 13 A T 23 A 33 To compute a preconditioner of A in parallel, we assign submatrices to each leaf of the task tree: ( A11 A 13 A T 13 A 1 33 ) T 2, ( A22 A 23 A T 23 A 2 33 ) T 3, with A A2 33 = A 33

18 Parallel algorithm Trees and sparse matrices Parallel algorithm Parallel implementation T 1 T 2 T 3 Tasks T 2, T 3 use ILUPACK to factorize, respectively, ( ) ( ) A11 A 13 L11, D A T 13 A 1 11, U 11 U L 13 S33 1 ( ) ( ) A22 A 23 L22, D A T 23 A 2 22, U 22 U L 23 S33 2

19 Parallel algorithm Trees and sparse matrices Parallel algorithm Parallel implementation T 1 S 1 S T 2 T 3 Tasks T 2, T 3 interact with T 3 to send, resp., S 1 33, S2 33

20 Parallel algorithm Trees and sparse matrices Parallel algorithm Parallel implementation T 1 S 1 S T 2 T 3 Tasks T 2, T 3 interact with T 3 to send, resp., S 1 33, S2 33 Task T 1 uses ILUPACK to factorize the submatrix S S2 33 S S2 33 L 33, D 33, U 33

21 Trees and sparse matrices Parallel algorithm Parallel implementation One forced level appears in the parallel preconditioner L 11, D 11, U 11 0 U 13 0 L 22, D 22, U 22 U 23 L 31 L 32 S 33 1 A L 33, D 33, U 33 First Level Second Level Then, the parallel preconditioner is not the same as the preconditioner computed by ILUPACK!

22 Task (dependency) tree whishes Trees and sparse matrices Parallel algorithm Parallel implementation In a good task tree (from the parallelization viewpoint): Parallelism: The leaves should concentrate the major part of the cost The number of leaves should enable a high degree of concurrency (taking into account that the overhead usually increases with their number) Load balancing: In general, the leaves should have homogenous costs

23 Achieving task tree whishes Trees and sparse matrices Parallel algorithm Parallel implementation Our approach combines: MLND (METIS): yields a best-effort balanced elimination tree with a high degree of concurrency Task splitting: divides those leaves with higher computational cost into finer-grain tasks to improve load balancing

24 Achieving task tree whishes Trees and sparse matrices Parallel algorithm Parallel implementation 2 Processors Bad load balancing T 1 Task splitting Load balancing improvement T1 T 2 T 3 T2 T T5 4 T 3 T6 T7 Static task generation (i.e., before execution)!

25 Achieving task tree whishes Trees and sparse matrices Parallel algorithm Parallel implementation... but the costs of the tasks are unknown a priori use an heuristic of the cost (weight): currently, the number of nonzeros in the submatrix to factorize the costs of the tasks can be heterogenous use dynamic scheduling for load balancing

26 Parallel implementation Trees and sparse matrices Parallel algorithm Parallel implementation OpenMP implementation Maintain a queue of tasks ready for execution (dependencies fulfilled) Priorize execution of leaves with larger weights Idle threads extract tasks from this structure and execute them When a task is completed, the task tree is examined to find out a new ready task, which is enqueued

27 Outline Motivation and Introduction 1 Motivation and Introduction 2 Incomplete LU factorization ILUPACK 3 Trees and sparse matrices Parallel algorithm Parallel implementation 4 5

28 Miscellaneous Motivation and Introduction SGI Altix 350 CC-NUMA SMM with: 16 Intel GHz 32 GBytes of RAM shared via a SGI NUMAlink interconnect No attempt to exploit CC-NUMA data locality yet IEEE double precision -O3 Intel Compiler optimization level One thread scheduled per processor In both the serial and parallel algorithms: MLND as initial static ordering ILUPACK default values

29 Benchmark Matrices SPD matrices obtained from the UF sparse matrix collection Code Matrix name Rows/Cols. Nonzeros M1 GHS_psdef/Apache M2 Schmid/Thermal M3 Schenk_AFE/Af_0_k M4 GHS_psdef/Inline_ M5 GHS_psdef/Apache M6 GHS_psdef/Audikw_

30 Task splitting Motivation and Introduction Guided by leaf tasks weights (heuristic cost) Splitting criteria: Let w i the weight of leaf task T i, W the overall matrix weight (i.e., the number of nonzero elements of the matrix), and p the number of threads Split those leaves with ratio w i W... larger than 1 p option A num_leaves p larger than 1 2p option B num_leaves 2p

31 In order to measure how w i s is balanced among threads, we use: c w = σ w w with σ w and w, the std. dev. and the avg. of thread weights, assuming a static mapping (i.e., before execution) of leaf tasks to threads The closer c w to 0, the better the weight balancing!

32 Option A Matrix #Leaf Tasks c w (%) M M M #Procs Option B Matrix #Leaf Tasks c w (%) M M M #Procs Halving #procs. for the same task tree improves weight balancing!

33 Option A Matrix #Leaf Tasks c w (%) M M M #Procs Option B Matrix #Leaf Tasks c w (%) M M M #Procs Halving #procs. for the same task tree improves weight balancing!

34 Option A Matrix #Leaf Tasks c w (%) M M M #Procs Option B Matrix #Leaf Tasks c w (%) M M M #Procs Halving #procs. for the same task tree improves weight balancing!

35 Option A Matrix #Leaf Tasks c w (%) M M M #Procs Option B Matrix #Leaf Tasks c w (%) M M M #Procs Halving #procs. for the same task tree improves weight balancing!

36 Option A Matrix #Leaf Tasks c w (%) M M M #Procs Option B Matrix #Leaf Tasks c w (%) M M M #Procs Halving #procs. for the same task tree improves weight balancing!

37 Option A Matrix #Leaf Tasks c w (%) M M M #Procs Option B Matrix #Leaf Tasks c w (%) M M M #Procs Halving #procs. for the same task tree improves weight balancing!

38 Speed-up vs. Load balancing Speed-up computed with respect to the parallel algorithm using the same task tree on a single processor In order to measure how t i s (costs of leaf tasks) are balanced among threads, we use: c t = σ t T with σ t, T, the std. dev. and the avg. of thread costs, taking into account how tasks have been (dynamically) mapped to threads (i.e., after execution) The closer c t to 0, the better the load balancing in the computation of leaves!

39 Speed-up vs. Load balancing Option A Matrix Speed-up c t (%) M M M #Procs Option B Matrix Speed-up c t (%) M M M #Procs Halving #procs. for the same task tree improves load balancing!

40 Speed-up vs. Load balancing Option A Matrix Speed-up c t (%) M M M #Procs Option B Matrix Speed-up c t (%) M M M #Procs Halving #procs. for the same task tree improves load balancing!

41 Speed-up vs. Load balancing Option A Matrix Speed-up c t (%) M M M #Procs Option B Matrix Speed-up c t (%) M M M #Procs Halving #procs. for the same task tree improves load balancing!

42 Speed-up vs. Load balancing Option A Matrix Speed-up c t (%) M M M #Procs Option B Matrix Speed-up c t (%) M M M #Procs Halving #procs. for the same task tree improves load balancing!

43 Speed-up vs. Load balancing Option A Matrix Speed-up c t (%) M M M #Procs Option B Matrix Speed-up c t (%) M M M #Procs Halving #procs. for the same task tree improves load balancing!

44 Speed-up vs. Load balancing Option A Matrix Speed-up c t (%) M M M #Procs Option B Matrix Speed-up c t (%) M M M #Procs Halving #procs. for the same task tree improves load balancing!

45 Preconditioner execution time Serial Algorithm Parallel Algorithm Option A Matrix Time (secs.) Time (secs.) M M M #Procs Serial Algorithm Parallel Algorithm Option B Matrix Time (secs.) Time (secs.) M M M #Procs

46 Preconditioner execution time Serial Algorithm Parallel Algorithm Option A Matrix Time (secs.) Time (secs.) M M M #Procs Serial Algorithm Parallel Algorithm Option B Matrix Time (secs.) Time (secs.) M M M #Procs

47 Speed-up vs. Load balancing Option A Matrix Time (secs.) Speed-up c t (%) c w (%) M #Procs. 16 num_leaves p and heterogeneous costs bad performance Option B Matrix Time (secs.) Speed-up c t (%) c w (%) M #Procs. 16 Good load balancing good performance Good news: the leaves concentrate the major part of the cost!

48 Outline Motivation and Introduction 1 Motivation and Introduction 2 Incomplete LU factorization ILUPACK 3 Trees and sparse matrices Parallel algorithm Parallel implementation 4 5

49 Conclusions We have presented a parallel preconditioner for the iterative solution of sparse linear systems on parallel architectures with shared-memory The preconditioner shares the principles with ILUPACK MLND ordering, task splitting, and dynamic scheduling are employed to improve load balancing, raise the degree of concurrency, and reduce the execution time In most cases, the parallel performance is satisfactory

50 Future Work To compare and contrast the numerical properties of the preconditioner in ILUPACK and our parallel preconditioner. To parallelize the application of the preconditioner to the system. To exploit data locality on CC-NUMA architectures. To design parallel preconditioners for non-spd linear systems. To develop MPI parallel preconditioners.

51 Questions?

J.I. Aliaga 1 M. Bollhöfer 2 A.F. Martín 1 E.S. Quintana-Ortí 1. March, 2009

J.I. Aliaga 1 M. Bollhöfer 2 A.F. Martín 1 E.S. Quintana-Ortí 1. March, 2009 Parallel Preconditioning of Linear Systems based on ILUPACK for Multithreaded Architectures J.I. Aliaga M. Bollhöfer 2 A.F. Martín E.S. Quintana-Ortí Deparment of Computer Science and Engineering, Univ.

More information

Saving Energy in Sparse and Dense Linear Algebra Computations

Saving Energy in Sparse and Dense Linear Algebra Computations Saving Energy in Sparse and Dense Linear Algebra Computations P. Alonso, M. F. Dolz, F. Igual, R. Mayo, E. S. Quintana-Ortí, V. Roca Univ. Politécnica Univ. Jaume I The Univ. of Texas de Valencia, Spain

More information

Leveraging Task-Parallelism in Energy-Efficient ILU Preconditioners

Leveraging Task-Parallelism in Energy-Efficient ILU Preconditioners Leveraging Task-Parallelism in Energy-Efficient ILU Preconditioners José I. Aliaga Leveraging task-parallelism in energy-efficient ILU preconditioners Universidad Jaime I (Castellón, Spain) José I. Aliaga

More information

Contents. Preface... xi. Introduction...

Contents. Preface... xi. Introduction... Contents Preface... xi Introduction... xv Chapter 1. Computer Architectures... 1 1.1. Different types of parallelism... 1 1.1.1. Overlap, concurrency and parallelism... 1 1.1.2. Temporal and spatial parallelism

More information

Incomplete Cholesky preconditioners that exploit the low-rank property

Incomplete Cholesky preconditioners that exploit the low-rank property anapov@ulb.ac.be ; http://homepages.ulb.ac.be/ anapov/ 1 / 35 Incomplete Cholesky preconditioners that exploit the low-rank property (theory and practice) Artem Napov Service de Métrologie Nucléaire, Université

More information

A robust multilevel approximate inverse preconditioner for symmetric positive definite matrices

A robust multilevel approximate inverse preconditioner for symmetric positive definite matrices DICEA DEPARTMENT OF CIVIL, ENVIRONMENTAL AND ARCHITECTURAL ENGINEERING PhD SCHOOL CIVIL AND ENVIRONMENTAL ENGINEERING SCIENCES XXX CYCLE A robust multilevel approximate inverse preconditioner for symmetric

More information

Balanced Truncation Model Reduction of Large and Sparse Generalized Linear Systems

Balanced Truncation Model Reduction of Large and Sparse Generalized Linear Systems Balanced Truncation Model Reduction of Large and Sparse Generalized Linear Systems Jos M. Badía 1, Peter Benner 2, Rafael Mayo 1, Enrique S. Quintana-Ortí 1, Gregorio Quintana-Ortí 1, A. Remón 1 1 Depto.

More information

Enhancing Scalability of Sparse Direct Methods

Enhancing Scalability of Sparse Direct Methods Journal of Physics: Conference Series 78 (007) 0 doi:0.088/7-6596/78//0 Enhancing Scalability of Sparse Direct Methods X.S. Li, J. Demmel, L. Grigori, M. Gu, J. Xia 5, S. Jardin 6, C. Sovinec 7, L.-Q.

More information

Fine-Grained Parallel Algorithms for Incomplete Factorization Preconditioning

Fine-Grained Parallel Algorithms for Incomplete Factorization Preconditioning Fine-Grained Parallel Algorithms for Incomplete Factorization Preconditioning Edmond Chow School of Computational Science and Engineering Georgia Institute of Technology, USA SPPEXA Symposium TU München,

More information

Preconditioning Techniques for Large Linear Systems Part III: General-Purpose Algebraic Preconditioners

Preconditioning Techniques for Large Linear Systems Part III: General-Purpose Algebraic Preconditioners Preconditioning Techniques for Large Linear Systems Part III: General-Purpose Algebraic Preconditioners Michele Benzi Department of Mathematics and Computer Science Emory University Atlanta, Georgia, USA

More information

Multilevel low-rank approximation preconditioners Yousef Saad Department of Computer Science and Engineering University of Minnesota

Multilevel low-rank approximation preconditioners Yousef Saad Department of Computer Science and Engineering University of Minnesota Multilevel low-rank approximation preconditioners Yousef Saad Department of Computer Science and Engineering University of Minnesota SIAM CSE Boston - March 1, 2013 First: Joint work with Ruipeng Li Work

More information

FINE-GRAINED PARALLEL INCOMPLETE LU FACTORIZATION

FINE-GRAINED PARALLEL INCOMPLETE LU FACTORIZATION FINE-GRAINED PARALLEL INCOMPLETE LU FACTORIZATION EDMOND CHOW AND AFTAB PATEL Abstract. This paper presents a new fine-grained parallel algorithm for computing an incomplete LU factorization. All nonzeros

More information

AMS Mathematics Subject Classification : 65F10,65F50. Key words and phrases: ILUS factorization, preconditioning, Schur complement, 1.

AMS Mathematics Subject Classification : 65F10,65F50. Key words and phrases: ILUS factorization, preconditioning, Schur complement, 1. J. Appl. Math. & Computing Vol. 15(2004), No. 1, pp. 299-312 BILUS: A BLOCK VERSION OF ILUS FACTORIZATION DAVOD KHOJASTEH SALKUYEH AND FAEZEH TOUTOUNIAN Abstract. ILUS factorization has many desirable

More information

FINE-GRAINED PARALLEL INCOMPLETE LU FACTORIZATION

FINE-GRAINED PARALLEL INCOMPLETE LU FACTORIZATION FINE-GRAINED PARALLEL INCOMPLETE LU FACTORIZATION EDMOND CHOW AND AFTAB PATEL Abstract. This paper presents a new fine-grained parallel algorithm for computing an incomplete LU factorization. All nonzeros

More information

Fine-grained Parallel Incomplete LU Factorization

Fine-grained Parallel Incomplete LU Factorization Fine-grained Parallel Incomplete LU Factorization Edmond Chow School of Computational Science and Engineering Georgia Institute of Technology Sparse Days Meeting at CERFACS June 5-6, 2014 Contribution

More information

Saving Energy in the LU Factorization with Partial Pivoting on Multi-Core Processors

Saving Energy in the LU Factorization with Partial Pivoting on Multi-Core Processors 20th Euromicro International Conference on Parallel, Distributed and Network-Based Special Session on Energy-aware Systems Saving Energy in the on Multi-Core Processors Pedro Alonso 1, Manuel F. Dolz 2,

More information

Recent advances in sparse linear solver technology for semiconductor device simulation matrices

Recent advances in sparse linear solver technology for semiconductor device simulation matrices Recent advances in sparse linear solver technology for semiconductor device simulation matrices (Invited Paper) Olaf Schenk and Michael Hagemann Department of Computer Science University of Basel Basel,

More information

Direct and Incomplete Cholesky Factorizations with Static Supernodes

Direct and Incomplete Cholesky Factorizations with Static Supernodes Direct and Incomplete Cholesky Factorizations with Static Supernodes AMSC 661 Term Project Report Yuancheng Luo 2010-05-14 Introduction Incomplete factorizations of sparse symmetric positive definite (SSPD)

More information

Challenges for Matrix Preconditioning Methods

Challenges for Matrix Preconditioning Methods Challenges for Matrix Preconditioning Methods Matthias Bollhoefer 1 1 Dept. of Mathematics TU Berlin Preconditioning 2005, Atlanta, May 19, 2005 supported by the DFG research center MATHEON in Berlin Outline

More information

c 2015 Society for Industrial and Applied Mathematics

c 2015 Society for Industrial and Applied Mathematics SIAM J. SCI. COMPUT. Vol. 37, No. 2, pp. C169 C193 c 2015 Society for Industrial and Applied Mathematics FINE-GRAINED PARALLEL INCOMPLETE LU FACTORIZATION EDMOND CHOW AND AFTAB PATEL Abstract. This paper

More information

Binding Performance and Power of Dense Linear Algebra Operations

Binding Performance and Power of Dense Linear Algebra Operations 10th IEEE International Symposium on Parallel and Distributed Processing with Applications Binding Performance and Power of Dense Linear Algebra Operations Maria Barreda, Manuel F. Dolz, Rafael Mayo, Enrique

More information

Efficiently solving large sparse linear systems on a distributed and heterogeneous grid by using the multisplitting-direct method

Efficiently solving large sparse linear systems on a distributed and heterogeneous grid by using the multisplitting-direct method Efficiently solving large sparse linear systems on a distributed and heterogeneous grid by using the multisplitting-direct method S. Contassot-Vivier, R. Couturier, C. Denis and F. Jézéquel Université

More information

Solving Large Nonlinear Sparse Systems

Solving Large Nonlinear Sparse Systems Solving Large Nonlinear Sparse Systems Fred W. Wubs and Jonas Thies Computational Mechanics & Numerical Mathematics University of Groningen, the Netherlands f.w.wubs@rug.nl Centre for Interdisciplinary

More information

Parallel Algorithms for Solution of Large Sparse Linear Systems with Applications

Parallel Algorithms for Solution of Large Sparse Linear Systems with Applications Parallel Algorithms for Solution of Large Sparse Linear Systems with Applications Murat Manguoğlu Department of Computer Engineering Middle East Technical University, Ankara, Turkey Prace workshop: HPC

More information

ENHANCING PERFORMANCE AND ROBUSTNESS OF ILU PRECONDITIONERS BY BLOCKING AND SELECTIVE TRANSPOSITION

ENHANCING PERFORMANCE AND ROBUSTNESS OF ILU PRECONDITIONERS BY BLOCKING AND SELECTIVE TRANSPOSITION SIAM J. SCI. COMPUT. Vol. 39, No., pp. A303 A332 c 207 Society for Industrial and Applied Mathematics ENHANCING PERFORMANCE AND ROBUSTNESS OF ILU PRECONDITIONERS BY BLOCKING AND SELECTIVE TRANSPOSITION

More information

A sparse multifrontal solver using hierarchically semi-separable frontal matrices

A sparse multifrontal solver using hierarchically semi-separable frontal matrices A sparse multifrontal solver using hierarchically semi-separable frontal matrices Pieter Ghysels Lawrence Berkeley National Laboratory Joint work with: Xiaoye S. Li (LBNL), Artem Napov (ULB), François-Henry

More information

Accelerating linear algebra computations with hybrid GPU-multicore systems.

Accelerating linear algebra computations with hybrid GPU-multicore systems. Accelerating linear algebra computations with hybrid GPU-multicore systems. Marc Baboulin INRIA/Université Paris-Sud joint work with Jack Dongarra (University of Tennessee and Oak Ridge National Laboratory)

More information

7. LU factorization. factor-solve method. LU factorization. solving Ax = b with A nonsingular. the inverse of a nonsingular matrix

7. LU factorization. factor-solve method. LU factorization. solving Ax = b with A nonsingular. the inverse of a nonsingular matrix EE507 - Computational Techniques for EE 7. LU factorization Jitkomut Songsiri factor-solve method LU factorization solving Ax = b with A nonsingular the inverse of a nonsingular matrix LU factorization

More information

AMS526: Numerical Analysis I (Numerical Linear Algebra for Computational and Data Sciences)

AMS526: Numerical Analysis I (Numerical Linear Algebra for Computational and Data Sciences) AMS526: Numerical Analysis I (Numerical Linear Algebra for Computational and Data Sciences) Lecture 19: Computing the SVD; Sparse Linear Systems Xiangmin Jiao Stony Brook University Xiangmin Jiao Numerical

More information

Preconditioning techniques to accelerate the convergence of the iterative solution methods

Preconditioning techniques to accelerate the convergence of the iterative solution methods Note Preconditioning techniques to accelerate the convergence of the iterative solution methods Many issues related to iterative solution of linear systems of equations are contradictory: numerical efficiency

More information

Parallelism in FreeFem++.

Parallelism in FreeFem++. Parallelism in FreeFem++. Guy Atenekeng 1 Frederic Hecht 2 Laura Grigori 1 Jacques Morice 2 Frederic Nataf 2 1 INRIA, Saclay 2 University of Paris 6 Workshop on FreeFem++, 2009 Outline 1 Introduction Motivation

More information

A dissection solver with kernel detection for unsymmetric matrices in FreeFem++

A dissection solver with kernel detection for unsymmetric matrices in FreeFem++ . p.1/21 11 Dec. 2014, LJLL, Paris FreeFem++ workshop A dissection solver with kernel detection for unsymmetric matrices in FreeFem++ Atsushi Suzuki Atsushi.Suzuki@ann.jussieu.fr Joint work with François-Xavier

More information

Computational Linear Algebra

Computational Linear Algebra Computational Linear Algebra PD Dr. rer. nat. habil. Ralf Peter Mundani Computation in Engineering / BGU Scientific Computing in Computer Science / INF Winter Term 2017/18 Part 2: Direct Methods PD Dr.

More information

Preface to the Second Edition. Preface to the First Edition

Preface to the Second Edition. Preface to the First Edition n page v Preface to the Second Edition Preface to the First Edition xiii xvii 1 Background in Linear Algebra 1 1.1 Matrices................................. 1 1.2 Square Matrices and Eigenvalues....................

More information

An Assessment of Incomplete-LU Preconditioners for Nonsymmetric Linear Systems 1

An Assessment of Incomplete-LU Preconditioners for Nonsymmetric Linear Systems 1 Informatica 17 page xxx yyy 1 An Assessment of Incomplete-LU Preconditioners for Nonsymmetric Linear Systems 1 John R. Gilbert Xerox Palo Alto Research Center 3333 Coyote Hill Road Palo Alto, CA 930, USA

More information

MARCH 24-27, 2014 SAN JOSE, CA

MARCH 24-27, 2014 SAN JOSE, CA MARCH 24-27, 2014 SAN JOSE, CA Sparse HPC on modern architectures Important scientific applications rely on sparse linear algebra HPCG a new benchmark proposal to complement Top500 (HPL) To solve A x =

More information

Static-scheduling and hybrid-programming in SuperLU DIST on multicore cluster systems

Static-scheduling and hybrid-programming in SuperLU DIST on multicore cluster systems Static-scheduling and hybrid-programming in SuperLU DIST on multicore cluster systems Ichitaro Yamazaki University of Tennessee, Knoxville Xiaoye Sherry Li Lawrence Berkeley National Laboratory MS49: Sparse

More information

ITERATIVE METHODS FOR SPARSE LINEAR SYSTEMS

ITERATIVE METHODS FOR SPARSE LINEAR SYSTEMS ITERATIVE METHODS FOR SPARSE LINEAR SYSTEMS YOUSEF SAAD University of Minnesota PWS PUBLISHING COMPANY I(T)P An International Thomson Publishing Company BOSTON ALBANY BONN CINCINNATI DETROIT LONDON MADRID

More information

Preconditioners for the incompressible Navier Stokes equations

Preconditioners for the incompressible Navier Stokes equations Preconditioners for the incompressible Navier Stokes equations C. Vuik M. ur Rehman A. Segal Delft Institute of Applied Mathematics, TU Delft, The Netherlands SIAM Conference on Computational Science and

More information

Che-Wei Chang Department of Computer Science and Information Engineering, Chang Gung University

Che-Wei Chang Department of Computer Science and Information Engineering, Chang Gung University Che-Wei Chang chewei@mail.cgu.edu.tw Department of Computer Science and Information Engineering, Chang Gung University } 2017/11/15 Midterm } 2017/11/22 Final Project Announcement 2 1. Introduction 2.

More information

Parallel Preconditioning Methods for Ill-conditioned Problems

Parallel Preconditioning Methods for Ill-conditioned Problems Parallel Preconditioning Methods for Ill-conditioned Problems Kengo Nakajima Information Technology Center, The University of Tokyo 2014 Conference on Advanced Topics and Auto Tuning in High Performance

More information

The flexible incomplete LU preconditioner for large nonsymmetric linear systems. Takatoshi Nakamura Takashi Nodera

The flexible incomplete LU preconditioner for large nonsymmetric linear systems. Takatoshi Nakamura Takashi Nodera Research Report KSTS/RR-15/006 The flexible incomplete LU preconditioner for large nonsymmetric linear systems by Takatoshi Nakamura Takashi Nodera Takatoshi Nakamura School of Fundamental Science and

More information

Solving PDEs with CUDA Jonathan Cohen

Solving PDEs with CUDA Jonathan Cohen Solving PDEs with CUDA Jonathan Cohen jocohen@nvidia.com NVIDIA Research PDEs (Partial Differential Equations) Big topic Some common strategies Focus on one type of PDE in this talk Poisson Equation Linear

More information

A High-Performance Parallel Hybrid Method for Large Sparse Linear Systems

A High-Performance Parallel Hybrid Method for Large Sparse Linear Systems Outline A High-Performance Parallel Hybrid Method for Large Sparse Linear Systems Azzam Haidar CERFACS, Toulouse joint work with Luc Giraud (N7-IRIT, France) and Layne Watson (Virginia Polytechnic Institute,

More information

Using Postordering and Static Symbolic Factorization for Parallel Sparse LU

Using Postordering and Static Symbolic Factorization for Parallel Sparse LU Using Postordering and Static Symbolic Factorization for Parallel Sparse LU Michel Cosnard LORIA - INRIA Lorraine Nancy, France Michel.Cosnard@loria.fr Laura Grigori LORIA - Univ. Henri Poincaré Nancy,

More information

Multilevel Low-Rank Preconditioners Yousef Saad Department of Computer Science and Engineering University of Minnesota. Modelling 2014 June 2,2014

Multilevel Low-Rank Preconditioners Yousef Saad Department of Computer Science and Engineering University of Minnesota. Modelling 2014 June 2,2014 Multilevel Low-Rank Preconditioners Yousef Saad Department of Computer Science and Engineering University of Minnesota Modelling 24 June 2,24 Dedicated to Owe Axelsson at the occasion of his 8th birthday

More information

Dense Arithmetic over Finite Fields with CUMODP

Dense Arithmetic over Finite Fields with CUMODP Dense Arithmetic over Finite Fields with CUMODP Sardar Anisul Haque 1 Xin Li 2 Farnam Mansouri 1 Marc Moreno Maza 1 Wei Pan 3 Ning Xie 1 1 University of Western Ontario, Canada 2 Universidad Carlos III,

More information

Preconditioning Techniques Analysis for CG Method

Preconditioning Techniques Analysis for CG Method Preconditioning Techniques Analysis for CG Method Huaguang Song Department of Computer Science University of California, Davis hso@ucdavis.edu Abstract Matrix computation issue for solve linear system

More information

Sparse BLAS-3 Reduction

Sparse BLAS-3 Reduction Sparse BLAS-3 Reduction to Banded Upper Triangular (Spar3Bnd) Gary Howell, HPC/OIT NC State University gary howell@ncsu.edu Sparse BLAS-3 Reduction p.1/27 Acknowledgements James Demmel, Gene Golub, Franc

More information

A Numerical Study of Some Parallel Algebraic Preconditioners

A Numerical Study of Some Parallel Algebraic Preconditioners A Numerical Study of Some Parallel Algebraic Preconditioners Xing Cai Simula Research Laboratory & University of Oslo PO Box 1080, Blindern, 0316 Oslo, Norway xingca@simulano Masha Sosonkina University

More information

Scientific Computing WS 2018/2019. Lecture 9. Jürgen Fuhrmann Lecture 9 Slide 1

Scientific Computing WS 2018/2019. Lecture 9. Jürgen Fuhrmann Lecture 9 Slide 1 Scientific Computing WS 2018/2019 Lecture 9 Jürgen Fuhrmann juergen.fuhrmann@wias-berlin.de Lecture 9 Slide 1 Lecture 9 Slide 2 Simple iteration with preconditioning Idea: Aû = b iterative scheme û = û

More information

Scientific Computing with Case Studies SIAM Press, Lecture Notes for Unit VII Sparse Matrix

Scientific Computing with Case Studies SIAM Press, Lecture Notes for Unit VII Sparse Matrix Scientific Computing with Case Studies SIAM Press, 2009 http://www.cs.umd.edu/users/oleary/sccswebpage Lecture Notes for Unit VII Sparse Matrix Computations Part 1: Direct Methods Dianne P. O Leary c 2008

More information

An advanced ILU preconditioner for the incompressible Navier-Stokes equations

An advanced ILU preconditioner for the incompressible Navier-Stokes equations An advanced ILU preconditioner for the incompressible Navier-Stokes equations M. ur Rehman C. Vuik A. Segal Delft Institute of Applied Mathematics, TU delft The Netherlands Computational Methods with Applications,

More information

Scalable Hybrid Programming and Performance for SuperLU Sparse Direct Solver

Scalable Hybrid Programming and Performance for SuperLU Sparse Direct Solver Scalable Hybrid Programming and Performance for SuperLU Sparse Direct Solver Sherry Li Lawrence Berkeley National Laboratory Piyush Sao Rich Vuduc Georgia Institute of Technology CUG 14, May 4-8, 14, Lugano,

More information

arxiv: v1 [math.na] 10 Jan 2019

arxiv: v1 [math.na] 10 Jan 2019 Robust Optimal-Complexity Multilevel ILU for Predominantly Symmetric Systems Aditi Ghai Xiangmin Jiao arxiv:1901.03249v1 [math.na] 10 Jan 2019 Abstract Incomplete factorization is a powerful preconditioner

More information

Numerical Methods I Non-Square and Sparse Linear Systems

Numerical Methods I Non-Square and Sparse Linear Systems Numerical Methods I Non-Square and Sparse Linear Systems Aleksandar Donev Courant Institute, NYU 1 donev@courant.nyu.edu 1 MATH-GA 2011.003 / CSCI-GA 2945.003, Fall 2014 September 25th, 2014 A. Donev (Courant

More information

Utilisation de la compression low-rank pour réduire la complexité du solveur PaStiX

Utilisation de la compression low-rank pour réduire la complexité du solveur PaStiX Utilisation de la compression low-rank pour réduire la complexité du solveur PaStiX 26 Septembre 2018 - JCAD 2018 - Lyon Grégoire Pichon, Mathieu Faverge, Pierre Ramet, Jean Roman Outline 1. Context 2.

More information

Robust Preconditioned Conjugate Gradient for the GPU and Parallel Implementations

Robust Preconditioned Conjugate Gradient for the GPU and Parallel Implementations Robust Preconditioned Conjugate Gradient for the GPU and Parallel Implementations Rohit Gupta, Martin van Gijzen, Kees Vuik GPU Technology Conference 2012, San Jose CA. GPU Technology Conference 2012,

More information

Feeding of the Thousands. Leveraging the GPU's Computing Power for Sparse Linear Algebra

Feeding of the Thousands. Leveraging the GPU's Computing Power for Sparse Linear Algebra SPPEXA Annual Meeting 2016, January 25 th, 2016, Garching, Germany Feeding of the Thousands Leveraging the GPU's Computing Power for Sparse Linear Algebra Hartwig Anzt Sparse Linear Algebra on GPUs Inherently

More information

Towards a highly-parallel PDE-Solver using Adaptive Sparse Grids on Compute Clusters

Towards a highly-parallel PDE-Solver using Adaptive Sparse Grids on Compute Clusters Towards a highly-parallel PDE-Solver using Adaptive Sparse Grids on Compute Clusters HIM - Workshop on Sparse Grids and Applications Alexander Heinecke Chair of Scientific Computing May 18 th 2011 HIM

More information

Performance and Energy Analysis of the Iterative Solution of Sparse Linear Systems on Multicore and Manycore Architectures

Performance and Energy Analysis of the Iterative Solution of Sparse Linear Systems on Multicore and Manycore Architectures Performance and Energy Analysis of the Iterative Solution of Sparse Linear Systems on Multicore and Manycore Architectures José I. Aliaga Performance and Energy Analysis of the Iterative Solution of Sparse

More information

Parallel Implementation of BDDC for Mixed-Hybrid Formulation of Flow in Porous Media

Parallel Implementation of BDDC for Mixed-Hybrid Formulation of Flow in Porous Media Parallel Implementation of BDDC for Mixed-Hybrid Formulation of Flow in Porous Media Jakub Šístek1 joint work with Jan Březina 2 and Bedřich Sousedík 3 1 Institute of Mathematics of the AS CR Nečas Center

More information

AMS526: Numerical Analysis I (Numerical Linear Algebra)

AMS526: Numerical Analysis I (Numerical Linear Algebra) AMS526: Numerical Analysis I (Numerical Linear Algebra) Lecture 24: Preconditioning and Multigrid Solver Xiangmin Jiao SUNY Stony Brook Xiangmin Jiao Numerical Analysis I 1 / 5 Preconditioning Motivation:

More information

Some notes on efficient computing and setting up high performance computing environments

Some notes on efficient computing and setting up high performance computing environments Some notes on efficient computing and setting up high performance computing environments Andrew O. Finley Department of Forestry, Michigan State University, Lansing, Michigan. April 17, 2017 1 Efficient

More information

Calculate Sensitivity Function Using Parallel Algorithm

Calculate Sensitivity Function Using Parallel Algorithm Journal of Computer Science 4 (): 928-933, 28 ISSN 549-3636 28 Science Publications Calculate Sensitivity Function Using Parallel Algorithm Hamed Al Rjoub Irbid National University, Irbid, Jordan Abstract:

More information

Enhancing Performance and Robustness of ILU Preconditioners by Blocking and Selective Transposition

Enhancing Performance and Robustness of ILU Preconditioners by Blocking and Selective Transposition RC 25577 (25906) November 6, 205 (Last update: December, 206) Computer Science/Mathematics IBM Research Report Enhancing Performance and Robustness of ILU Preconditioners by Blocking and Selective Transposition

More information

Improvements for Implicit Linear Equation Solvers

Improvements for Implicit Linear Equation Solvers Improvements for Implicit Linear Equation Solvers Roger Grimes, Bob Lucas, Clement Weisbecker Livermore Software Technology Corporation Abstract Solving large sparse linear systems of equations is often

More information

Algorithm for Sparse Approximate Inverse Preconditioners in the Conjugate Gradient Method

Algorithm for Sparse Approximate Inverse Preconditioners in the Conjugate Gradient Method Algorithm for Sparse Approximate Inverse Preconditioners in the Conjugate Gradient Method Ilya B. Labutin A.A. Trofimuk Institute of Petroleum Geology and Geophysics SB RAS, 3, acad. Koptyug Ave., Novosibirsk

More information

Model Order Reduction via Matlab Parallel Computing Toolbox. Istanbul Technical University

Model Order Reduction via Matlab Parallel Computing Toolbox. Istanbul Technical University Model Order Reduction via Matlab Parallel Computing Toolbox E. Fatih Yetkin & Hasan Dağ Istanbul Technical University Computational Science & Engineering Department September 21, 2009 E. Fatih Yetkin (Istanbul

More information

Accelerating Model Reduction of Large Linear Systems with Graphics Processors

Accelerating Model Reduction of Large Linear Systems with Graphics Processors Accelerating Model Reduction of Large Linear Systems with Graphics Processors P. Benner 1, P. Ezzatti 2, D. Kressner 3, E.S. Quintana-Ortí 4, Alfredo Remón 4 1 Max-Plank-Institute for Dynamics of Complex

More information

Accelerating Linear Algebra on Heterogeneous Architectures of Multicore and GPUs using MAGMA and DPLASMA and StarPU Schedulers

Accelerating Linear Algebra on Heterogeneous Architectures of Multicore and GPUs using MAGMA and DPLASMA and StarPU Schedulers UT College of Engineering Tutorial Accelerating Linear Algebra on Heterogeneous Architectures of Multicore and GPUs using MAGMA and DPLASMA and StarPU Schedulers Stan Tomov 1, George Bosilca 1, and Cédric

More information

SCALABLE HYBRID SPARSE LINEAR SOLVERS

SCALABLE HYBRID SPARSE LINEAR SOLVERS The Pennsylvania State University The Graduate School Department of Computer Science and Engineering SCALABLE HYBRID SPARSE LINEAR SOLVERS A Thesis in Computer Science and Engineering by Keita Teranishi

More information

Gaussian Elimination without/with Pivoting and Cholesky Decomposition

Gaussian Elimination without/with Pivoting and Cholesky Decomposition Gaussian Elimination without/with Pivoting and Cholesky Decomposition Gaussian Elimination WITHOUT pivoting Notation: For a matrix A R n n we define for k {,,n} the leading principal submatrix a a k A

More information

Matrix Assembly in FEA

Matrix Assembly in FEA Matrix Assembly in FEA 1 In Chapter 2, we spoke about how the global matrix equations are assembled in the finite element method. We now want to revisit that discussion and add some details. For example,

More information

Solving the Inverse Toeplitz Eigenproblem Using ScaLAPACK and MPI *

Solving the Inverse Toeplitz Eigenproblem Using ScaLAPACK and MPI * Solving the Inverse Toeplitz Eigenproblem Using ScaLAPACK and MPI * J.M. Badía and A.M. Vidal Dpto. Informática., Univ Jaume I. 07, Castellón, Spain. badia@inf.uji.es Dpto. Sistemas Informáticos y Computación.

More information

Preconditioning techniques for highly indefinite linear systems Yousef Saad Department of Computer Science and Engineering University of Minnesota

Preconditioning techniques for highly indefinite linear systems Yousef Saad Department of Computer Science and Engineering University of Minnesota Preconditioning techniques for highly indefinite linear systems Yousef Saad Department of Computer Science and Engineering University of Minnesota Institut C. Jordan, Lyon, 25/03/08 Introduction: Linear

More information

MODIFICATION AND COMPENSATION STRATEGIES FOR THRESHOLD-BASED INCOMPLETE FACTORIZATIONS

MODIFICATION AND COMPENSATION STRATEGIES FOR THRESHOLD-BASED INCOMPLETE FACTORIZATIONS MODIFICATION AND COMPENSATION STRATEGIES FOR THRESHOLD-BASED INCOMPLETE FACTORIZATIONS S. MACLACHLAN, D. OSEI-KUFFUOR, AND YOUSEF SAAD Abstract. Standard (single-level) incomplete factorization preconditioners

More information

Sparse linear solvers

Sparse linear solvers Sparse linear solvers Laura Grigori ALPINES INRIA and LJLL, UPMC On sabbatical at UC Berkeley March 2015 Plan Sparse linear solvers Sparse matrices and graphs Classes of linear solvers Sparse Cholesky

More information

Adaptive Spike-Based Solver 1.0 User Guide

Adaptive Spike-Based Solver 1.0 User Guide Intel r Adaptive Spike-Based Solver 10 User Guide I V 1 W 2 I V 2 W 3 I V 3 W 4 I 1 Contents 1 Overview 4 11 A Quick What, Why, and How 4 12 A Hello World Example 7 13 Future Developments 8 14 User Guide

More information

III. Applications in convex optimization

III. Applications in convex optimization III. Applications in convex optimization nonsymmetric interior-point methods partial separability and decomposition partial separability first order methods interior-point methods Conic linear optimization

More information

An Empirical Comparison of Graph Laplacian Solvers

An Empirical Comparison of Graph Laplacian Solvers An Empirical Comparison of Graph Laplacian Solvers Kevin Deweese 1 Erik Boman 2 John Gilbert 1 1 Department of Computer Science University of California, Santa Barbara 2 Scalable Algorithms Department

More information

Iterative Methods. Splitting Methods

Iterative Methods. Splitting Methods Iterative Methods Splitting Methods 1 Direct Methods Solving Ax = b using direct methods. Gaussian elimination (using LU decomposition) Variants of LU, including Crout and Doolittle Other decomposition

More information

Level-3 BLAS on a GPU

Level-3 BLAS on a GPU Level-3 BLAS on a GPU Picking the Low Hanging Fruit Francisco Igual 1 Gregorio Quintana-Ortí 1 Robert A. van de Geijn 2 1 Departamento de Ingeniería y Ciencia de los Computadores. University Jaume I. Castellón

More information

9.1 Preconditioned Krylov Subspace Methods

9.1 Preconditioned Krylov Subspace Methods Chapter 9 PRECONDITIONING 9.1 Preconditioned Krylov Subspace Methods 9.2 Preconditioned Conjugate Gradient 9.3 Preconditioned Generalized Minimal Residual 9.4 Relaxation Method Preconditioners 9.5 Incomplete

More information

Parallel Circuit Simulation. Alexander Hayman

Parallel Circuit Simulation. Alexander Hayman Parallel Circuit Simulation Alexander Hayman In the News LTspice IV features multi-threaded solvers designed to better utilize current multicore processors. Also included are new SPARSE matrix solvers

More information

Computation of the mtx-vec product based on storage scheme on vector CPUs

Computation of the mtx-vec product based on storage scheme on vector CPUs BLAS: Basic Linear Algebra Subroutines BLAS: Basic Linear Algebra Subroutines BLAS: Basic Linear Algebra Subroutines Analysis of the Matrix Computation of the mtx-vec product based on storage scheme on

More information

LU Factorization. Marco Chiarandini. DM559 Linear and Integer Programming. Department of Mathematics & Computer Science University of Southern Denmark

LU Factorization. Marco Chiarandini. DM559 Linear and Integer Programming. Department of Mathematics & Computer Science University of Southern Denmark DM559 Linear and Integer Programming LU Factorization Marco Chiarandini Department of Mathematics & Computer Science University of Southern Denmark [Based on slides by Lieven Vandenberghe, UCLA] Outline

More information

AMS526: Numerical Analysis I (Numerical Linear Algebra) Lecture 23: GMRES and Other Krylov Subspace Methods; Preconditioning

AMS526: Numerical Analysis I (Numerical Linear Algebra) Lecture 23: GMRES and Other Krylov Subspace Methods; Preconditioning AMS526: Numerical Analysis I (Numerical Linear Algebra) Lecture 23: GMRES and Other Krylov Subspace Methods; Preconditioning Xiangmin Jiao SUNY Stony Brook Xiangmin Jiao Numerical Analysis I 1 / 18 Outline

More information

Highly-scalable branch and bound for maximum monomial agreement

Highly-scalable branch and bound for maximum monomial agreement Highly-scalable branch and bound for maximum monomial agreement Jonathan Eckstein (Rutgers) William Hart Cynthia A. Phillips Sandia National Laboratories Sandia National Laboratories is a multi-program

More information

Parallel Computation of the Eigenstructure of Toeplitz-plus-Hankel matrices on Multicomputers

Parallel Computation of the Eigenstructure of Toeplitz-plus-Hankel matrices on Multicomputers Parallel Computation of the Eigenstructure of Toeplitz-plus-Hankel matrices on Multicomputers José M. Badía * and Antonio M. Vidal * Departamento de Sistemas Informáticos y Computación Universidad Politécnica

More information

Parallel Transposition of Sparse Data Structures

Parallel Transposition of Sparse Data Structures Parallel Transposition of Sparse Data Structures Hao Wang, Weifeng Liu, Kaixi Hou, Wu-chun Feng Department of Computer Science, Virginia Tech Niels Bohr Institute, University of Copenhagen Scientific Computing

More information

MUMPS. The MUMPS library: work done during the SOLSTICE project. MUMPS team, Lyon-Grenoble, Toulouse, Bordeaux

MUMPS. The MUMPS library: work done during the SOLSTICE project. MUMPS team, Lyon-Grenoble, Toulouse, Bordeaux The MUMPS library: work done during the SOLSTICE project MUMPS team, Lyon-Grenoble, Toulouse, Bordeaux Sparse Days and ANR SOLSTICE Final Workshop June MUMPS MUMPS Team since beg. of SOLSTICE (2007) Permanent

More information

The new challenges to Krylov subspace methods Yousef Saad Department of Computer Science and Engineering University of Minnesota

The new challenges to Krylov subspace methods Yousef Saad Department of Computer Science and Engineering University of Minnesota The new challenges to Krylov subspace methods Yousef Saad Department of Computer Science and Engineering University of Minnesota SIAM Applied Linear Algebra Valencia, June 18-22, 2012 Introduction Krylov

More information

Dynamic Scheduling for Work Agglomeration on Heterogeneous Clusters

Dynamic Scheduling for Work Agglomeration on Heterogeneous Clusters Dynamic Scheduling for Work Agglomeration on Heterogeneous Clusters Jonathan Lifflander, G. Carl Evans, Anshu Arya, Laxmikant Kale University of Illinois Urbana-Champaign May 25, 2012 Work is overdecomposed

More information

Information Sciences Institute 22 June 2012 Bob Lucas, Gene Wagenbreth, Dan Davis, Roger Grimes and

Information Sciences Institute 22 June 2012 Bob Lucas, Gene Wagenbreth, Dan Davis, Roger Grimes and Accelerating the Multifrontal Method Information Sciences Institute 22 June 2012 Bob Lucas, Gene Wagenbreth, Dan Davis, Roger Grimes {rflucas,genew,ddavis}@isi.edu and grimes@lstc.com 3D Finite Element

More information

On the design of parallel linear solvers for large scale problems

On the design of parallel linear solvers for large scale problems On the design of parallel linear solvers for large scale problems ICIAM - August 2015 - Mini-Symposium on Recent advances in matrix computations for extreme-scale computers M. Faverge, X. Lacoste, G. Pichon,

More information

Solving Ax = b, an overview. Program

Solving Ax = b, an overview. Program Numerical Linear Algebra Improving iterative solvers: preconditioning, deflation, numerical software and parallelisation Gerard Sleijpen and Martin van Gijzen November 29, 27 Solving Ax = b, an overview

More information

Matrix Computations: Direct Methods II. May 5, 2014 Lecture 11

Matrix Computations: Direct Methods II. May 5, 2014 Lecture 11 Matrix Computations: Direct Methods II May 5, 2014 ecture Summary You have seen an example of how a typical matrix operation (an important one) can be reduced to using lower level BS routines that would

More information

Parallelism in Structured Newton Computations

Parallelism in Structured Newton Computations Parallelism in Structured Newton Computations Thomas F Coleman and Wei u Department of Combinatorics and Optimization University of Waterloo Waterloo, Ontario, Canada N2L 3G1 E-mail: tfcoleman@uwaterlooca

More information

arxiv: v1 [cs.na] 20 Jul 2015

arxiv: v1 [cs.na] 20 Jul 2015 AN EFFICIENT SOLVER FOR SPARSE LINEAR SYSTEMS BASED ON RANK-STRUCTURED CHOLESKY FACTORIZATION JEFFREY N. CHADWICK AND DAVID S. BINDEL arxiv:1507.05593v1 [cs.na] 20 Jul 2015 Abstract. Direct factorization

More information