Multilevel low-rank approximation preconditioners Yousef Saad Department of Computer Science and Engineering University of Minnesota

Similar documents
Multilevel Low-Rank Preconditioners Yousef Saad Department of Computer Science and Engineering University of Minnesota. Modelling 2014 June 2,2014

Parallel Multilevel Low-Rank approximation preconditioners Yousef Saad Department of Computer Science and Engineering University of Minnesota

Multilevel preconditioning techniques with applications Yousef Saad Department of Computer Science and Engineering University of Minnesota

The new challenges to Krylov subspace methods Yousef Saad Department of Computer Science and Engineering University of Minnesota

1. Introduction. We consider the problem of solving the linear system. Ax = b, (1.1)

Preconditioning Techniques for Large Linear Systems Part III: General-Purpose Algebraic Preconditioners

Solving Symmetric Indefinite Systems with Symmetric Positive Definite Preconditioners

Incomplete Cholesky preconditioners that exploit the low-rank property

Robust Preconditioned Conjugate Gradient for the GPU and Parallel Implementations

Fine-grained Parallel Incomplete LU Factorization

Schur complement-based domain decomposition preconditioners with low-rank corrections

ITERATIVE METHODS FOR SPARSE LINEAR SYSTEMS

9.1 Preconditioned Krylov Subspace Methods

A robust multilevel approximate inverse preconditioner for symmetric positive definite matrices

Preface to the Second Edition. Preface to the First Edition

AMS526: Numerical Analysis I (Numerical Linear Algebra) Lecture 23: GMRES and Other Krylov Subspace Methods; Preconditioning

Domain Decomposition-based contour integration eigenvalue solvers

FINE-GRAINED PARALLEL INCOMPLETE LU FACTORIZATION

FINE-GRAINED PARALLEL INCOMPLETE LU FACTORIZATION

c 2015 Society for Industrial and Applied Mathematics

Parallelization of Multilevel Preconditioners Constructed from Inverse-Based ILUs on Shared-Memory Multiprocessors

Scientific Computing with Case Studies SIAM Press, Lecture Notes for Unit VII Sparse Matrix

Multigrid absolute value preconditioning

AMS Mathematics Subject Classification : 65F10,65F50. Key words and phrases: ILUS factorization, preconditioning, Schur complement, 1.

FINDING PARALLELISM IN GENERAL-PURPOSE LINEAR PROGRAMMING

A Numerical Study of Some Parallel Algebraic Preconditioners

Effective matrix-free preconditioning for the augmented immersed interface method

The flexible incomplete LU preconditioner for large nonsymmetric linear systems. Takatoshi Nakamura Takashi Nodera

Solving Ax = b, an overview. Program

Fine-Grained Parallel Algorithms for Incomplete Factorization Preconditioning

Some Geometric and Algebraic Aspects of Domain Decomposition Methods

Incomplete LU Preconditioning and Error Compensation Strategies for Sparse Matrices

Fast Iterative Solution of Saddle Point Problems

MARCH 24-27, 2014 SAN JOSE, CA

Divide and conquer algorithms for large eigenvalue problems Yousef Saad Department of Computer Science and Engineering University of Minnesota

FAST STRUCTURED EIGENSOLVER FOR DISCRETIZED PARTIAL DIFFERENTIAL OPERATORS ON GENERAL MESHES

J.I. Aliaga 1 M. Bollhöfer 2 A.F. Martín 1 E.S. Quintana-Ortí 1. March, 2009

Parallel Preconditioning Methods for Ill-conditioned Problems

Solving Large Nonlinear Sparse Systems

Multipole-Based Preconditioners for Sparse Linear Systems.

OUTLINE ffl CFD: elliptic pde's! Ax = b ffl Basic iterative methods ffl Krylov subspace methods ffl Preconditioning techniques: Iterative methods ILU

Jae Heon Yun and Yu Du Han

Contents. Preface... xi. Introduction...

The Conjugate Gradient Method

A SPARSE APPROXIMATE INVERSE PRECONDITIONER FOR NONSYMMETRIC LINEAR SYSTEMS

Fast Structured Spectral Methods

Numerical Methods I Non-Square and Sparse Linear Systems

Algorithm for Sparse Approximate Inverse Preconditioners in the Conjugate Gradient Method

Conjugate gradient method. Descent method. Conjugate search direction. Conjugate Gradient Algorithm (294)

Iterative Methods for Sparse Linear Systems

A parameter tuning technique of a weighted Jacobi-type preconditioner and its application to supernova simulations

Accelerating linear algebra computations with hybrid GPU-multicore systems.

A sparse multifrontal solver using hierarchically semi-separable frontal matrices

An advanced ILU preconditioner for the incompressible Navier-Stokes equations

Iterative Methods and Multigrid

Department of Computer Science and Engineering

Notes on PCG for Sparse Linear Systems

A Domain Decomposition Based Jacobi-Davidson Algorithm for Quantum Dot Simulation

EFFECTIVE AND ROBUST PRECONDITIONING OF GENERAL SPD MATRICES VIA STRUCTURED INCOMPLETE FACTORIZATION

Multilevel preconditioning techniques with applications Yousef Saad Department of Computer Science and Engineering University of Minnesota

SPARSE SOLVERS POISSON EQUATION. Margreet Nool. November 9, 2015 FOR THE. CWI, Multiscale Dynamics

Parallel Algorithms for Solution of Large Sparse Linear Systems with Applications

Partial Left-Looking Structured Multifrontal Factorization & Algorithms for Compressed Sensing. Cinna Julie Wu

On the Preconditioning of the Block Tridiagonal Linear System of Equations

Preconditioning Techniques Analysis for CG Method

Topics. The CG Algorithm Algorithmic Options CG s Two Main Convergence Theorems

Lecture 8: Fast Linear Solvers (Part 7)

AMS526: Numerical Analysis I (Numerical Linear Algebra for Computational and Data Sciences)

PASC17 - Lugano June 27, 2017

Splitting Iteration Methods for Positive Definite Linear Systems

Multilevel Preconditioning of Graph-Laplacians: Polynomial Approximation of the Pivot Blocks Inverses

Fast algorithms for hierarchically semiseparable matrices

Multilevel preconditioning techniques with applications Yousef Saad Department of Computer Science and Engineering University of Minnesota

The EVSL package for symmetric eigenvalue problems Yousef Saad Department of Computer Science and Engineering University of Minnesota

Efficient Solvers for the Navier Stokes Equations in Rotation Form

Lecture 11: CMSC 878R/AMSC698R. Iterative Methods An introduction. Outline. Inverse, LU decomposition, Cholesky, SVD, etc.

Chapter 7 Iterative Techniques in Matrix Algebra

Efficient domain decomposition methods for the time-harmonic Maxwell equations

Linear Solvers. Andrew Hazel

Lecture 18 Classical Iterative Methods

Enhancing Scalability of Sparse Direct Methods

An Adaptive Hierarchical Matrix on Point Iterative Poisson Solver

A short course on: Preconditioned Krylov subspace methods. Yousef Saad University of Minnesota Dept. of Computer Science and Engineering

AMS526: Numerical Analysis I (Numerical Linear Algebra)

Sparse Linear Systems. Iterative Methods for Sparse Linear Systems. Motivation for Studying Sparse Linear Systems. Partial Differential Equations

Parallel Iterative Methods for Sparse Linear Systems. H. Martin Bücker Lehrstuhl für Hochleistungsrechnen

Leveraging Task-Parallelism in Energy-Efficient ILU Preconditioners

Mathematics and Computer Science

A robust inner-outer HSS preconditioner

ERLANGEN REGIONAL COMPUTING CENTER

Inexact inverse iteration with preconditioning

A Robust Preconditioned Iterative Method for the Navier-Stokes Equations with High Reynolds Numbers

ILAS 2017 July 25, 2017

FEM and sparse linear system solving

Preconditioning techniques to accelerate the convergence of the iterative solution methods

TR A Comparison of the Performance of SaP::GPU and Intel s Math Kernel Library (MKL) for Solving Dense Banded Linear Systems

Department of Computer Science, University of Illinois at Urbana-Champaign

On the design of parallel linear solvers for large scale problems

AN INTRODUCTION TO DOMAIN DECOMPOSITION METHODS. Gérard MEURANT CEA

Architecture Solutions for DDM Numerical Environment

Transcription:

Multilevel low-rank approximation preconditioners Yousef Saad Department of Computer Science and Engineering University of Minnesota SIAM CSE Boston - March 1, 2013

First: Joint work with Ruipeng Li Work supported by NSF 2 SIAM-CSE March 1, 2013

Introduction Preconditioned Krylov subspace methods offer a good alternative to direct solution methods Especially for 3D problems Compromise between performance and robustness... But there are challenges: Highly indefinite systems [Helmholtz, Maxwell,...] Highly ill-conditioned systems [structures,..] Problems with extremely irregular nonzero pattern Recent: impact of new architectures [many core, GPUs] 3 SIAM-CSE March 1, 2013

Introduction (cont.) Main issue in using GPUs for sparse computations: Huge performance degradation due to irregular sparsity Matrices: Matrix -name N NNZ FEM/Cantilever 62,451 4,007,383 Boeing/pwtk 217,918 11,634,424 Performance of Mat-Vecs on NVIDIA Tesla C1060 Single Precision Double Precision Matrix CSR JAD DIA CSR JAD DIA FEM/Cantilever 9.4 10.8 25.7 7.5 5.0 13.4 Boeing/pwtk 8.9 16.6 29.5 7.2 10.4 14.5 4 SIAM-CSE March 1, 2013

Sparse Forward/Backward Sweeps Next major ingredient of precond. Krylov subs. methods ILU preconditioning operations require L/U solves: x U 1 L 1 x Sequential outer loop. for i=1:n for j=ia(i):ia(i+1) x(i) = x(i) - a(j)*x(ja(j)) end end Parallelism can be achieved with level scheduling: Group unknowns into levels Unknowns x(i) of same level can be computed simultaneously 1 nlev n 5 SIAM-CSE March 1, 2013

ILU: Sparse Forward/Backward Sweeps Very poor performance [relative to CPU] Matrix N CPU GPU-Lev Mflops #lev Mflops Boeing/bcsstk36 23,052 627 4,457 43 FEM/Cantilever 62,451 653 2,397 168 COP/CASEYK 696,665 394 273 142 COP/CASEKU 208,340 373 272 115 Prec: miserable :-) GPU Sparse Triangular Solve with Level Scheduling Very poor performance when #levs is large A few things can be done to reduce the # levels but perf. will remain poor 6 SIAM-CSE March 1, 2013

So... 7 SIAM-CSE March 1, 2013

Either GPUs must go... 8 SIAM-CSE March 1, 2013

or ILUs must go... 9 SIAM-CSE March 1, 2013

Or perhaps: Alternative preconditioners? What would be a good alternative? Wish-list: A preconditioner requiring few irregular computations Something that trades volume of computations for speed If possible something that is robust for indefinite case Good candidate: Multilevel Low-Rank (MLR) approximate inverse preconditioners 10 SIAM-CSE March 1, 2013

Related work: Work on HSS matrices [e.g., JIANLIN XIA, SHIVKUMAR CHAN- DRASEKARAN, MING GU, AND XIAOYE S. LI, Fast algorithms for hierarchically semiseparable matrices, Numerical Linear Algebra with Applications, 17 (2010), pp. 953 976.] Work on H-matrices [Hackbusch,...] Work on balanced incomplete factorizations (R. Bru et al.) Work on sweeping preconditioners by Engquist and Ying. Work on computing the diagonal of a matrix inverse [Jok Tang and YS (2010)..] 11 SIAM-CSE March 1, 2013

Low-rank Multilevel Approximations Starting point: symmetric matrix derived from a 5-point discretization of a 2-D Pb on n x n y grid A = A = A 1 D 2 D 2 A 2... D 3...... D α A α D α+1 D α+1 A α+1............ D ny A ny ( ) A11 A 12 A 21 A 22 ( A11 A22) + ( A 21 A 12 ) 12 SIAM-CSE March 1, 2013

Corresponding splitting on FD mesh: 8 7 6 5 4 3 2 1 1 2 3 4 5 6 7 8 13 SIAM-CSE March 1, 2013

A 11 R m m, A 22 R (n m) (n m) In the simplest case second matrix is: ( ) A11 A 12 = A 21 A 22 ( A11 A22) + I I Write 2nd matrix as: I I = + I + I I I I I E T = I I E E T 14 SIAM-CSE March 1, 2013

Above splitting can be rewritten as A = (A + EE T ) }{{} B EE T B := ( B1 B2) A = B EE T, R n n, E := ( E1 E 2 ) R n n x, Note: B 1 := A 11 + E 1 E T 1, B 2 := A 22 + E 2 E T 2. 15 SIAM-CSE March 1, 2013

Shermann-Morrison formula: {}}{ A 1 = B 1 + B 1 E( I E T B 1 E) 1 E T B 1 X A 1 B 1 + B 1 EX 1 E T B 1 X = I E T B 1 E Note: E R n n x, X R n x n x n x = number of points in separator [O(n 1/2 ) for 2-D mesh, O(n 2/3 ) for 3-D mesh] Use in a recursive framework Similar idea was used for computing the diagonal of the inverse [J. Tang YS 10] 16 SIAM-CSE March 1, 2013

Multilevel Low-Rank (MLR) algorithm Method: Use lowrank approx. for B 1 E B 1 E U k V T k, U k R n k, V k R n x k, Replace B 1 E by U k Vk T in X = I (ET B 1 )E: X G k = I V k U T k E, ( Rn x n x ) Leads to... Preconditioner: M 1 = B 1 + U k [V T k G 1 k V k]u T k Use recursivity 17 SIAM-CSE March 1, 2013

Note: From A 1 = B 1 [I + EX 1 E T B 1 ] could define: M 1 1 = B 1 [I + EG 1 k V ku T k ]. [rationale: approximation made on one side only ] It turns out M 1 and M are equal! We have: M 1 = B 1 + U k H k U T k, with H k = V T k G 1 k V k. No need to store V k : Only keep U k and H k (k k). We can show : H k = (I U T k EV k) 1... and : H k is symmetric 18 SIAM-CSE March 1, 2013

Recursive multilevel framework ( A i = B i + E i Ei T, B Bi1 i. Bi2) Next level, set A i1 B i1 and A i2 B i2 Repeat on A i1, A i2 Last level, factor A i (IC, ILU) Binary tree structure: 1 3 7 8 4 0 1 2 2 3 7 8 4 5 6 9 1011 1213 14 19 SIAM-CSE March 1, 2013

Generalization: Domain Decomposition framework Domain partitioned into 2 domains with an edge separator Ω 2 Ω 1 Matrix can be permuted to: P AP T = ˆB 1 ˆF 1 ˆF 1 T C 1 X X T ˆF 2 T C 2 ˆB 2 ˆF 2 Interface nodes in each domain are listed last. 20 SIAM-CSE March 1, 2013

Each matrix ˆB i is of size n i n i (interior var.) and the matrix C i is of size m i m i (interface var.) P AP T = Let: E α = ( B1 B2) and α used for balancing 0 αi 0 X T α EE T with B i = then we have: { D1 = α 2 I D 2 = 1 α 2 X T X. ( ˆB i ˆF 1 Better results when using diagonals instead of αi ˆF T i C i + D i 21 SIAM-CSE March 1, 2013 )

Theory: 2-level analysis for model problem Interested in eigenvalues γ j of A 1 B 1 = B 1 EX 1 E T B 1 when A = Pure Laplacean.. They are: γ j = β j = α j = β j 1 α j, j = 1,, n x with: n y /2 k=1 n y /2 k=1 4 ( sin 2 sin 2 sin 2 n y kπ n y +1 kπ n y +1 + sin2 sin 2 n y kπ n y +1 kπ n y +1 + sin2 ) 2, jπ 2(n x +1) jπ 2(n x +1). 22 SIAM-CSE March 1, 2013

Decay of the γ j s when nx = ny = 32. 2.5 2 1.5 β j sqrt(β j ) 1 α j 45 40 35 30 25 γ j 1 20 15 0.5 10 5 0 0 5 10 15 20 25 30 35 0 0 5 10 15 20 25 30 35 Note β j are the singular values of B 1 E. In this particular case 3 eigenvectors will capture 92 % of the inverse whereas 5 eigenvectors will capture 97% of the inverse. 23 SIAM-CSE March 1, 2013

EXPERIMENTS

Experimental setting Hardware: Intel Xeon X5675 processor (12 MB Cache, 3.06 GHz, 6-core) C/C++; Intel Math Kernel Library (MKL,version 10.2) Stopping criteria: r i 10 8 r 0 Maximum number of iterations: 500 25 SIAM-CSE March 1, 2013

2-D/3-D model problems (theory) u cu = ( x 2 + y 2 + c ) e xy in (0, 1) 2, + Dirichlet BC FD discret.: n x = n y = 256 Eigenvalues of B 1 i E i X 1 i i = 0, 1, 3 Rapid decay. Ei T B 1 i 2500 2000 1500 1000 500 Lev 1 Lev 2 Lev 3 0 0 5 10 15 20 25 30 35 40 45 50 26 SIAM-CSE March 1, 2013

3-D elliptic PDE u cu = 6 c ( x 2 + y 2 + z 2) in (0, 1) 3, + Dirichlet BC FD discret.: n x = n y = 32, nz = 64 Eigenvalues of B 1 i E i X 1 i i = 0, 1, 3 Rapid decay. Ei T B 1 i 30 25 20 15 10 5 Lev 1 Lev 2 Lev 3 0 0 5 10 15 20 25 30 35 40 45 50 27 SIAM-CSE March 1, 2013

Tests: SPD cases SPD cases, pure Laplacean (c = 0 in previous equations) MLR + PCG compared to IC + PCG 2-D problems: #lev= 5, rank= 2 3-D problems: #lev= 5, 7, 10, rank= 2 28 SIAM-CSE March 1, 2013

Grid N ICT-CG MLR-CG fill p-t its i-t fill p-t its i-t 256 2 65K 3.1 0.08 69 0.19 3.2 0.45 84 0.12 512 2 262K 3.2 0.32 133 1.61 3.5 1.57 132 1.06 1024 2 1, 048K 3.4 1.40 238 15.11 3.5 4.66 215 9.77 32 2.64 65K 2.9 0.14 33 0.10 3.0 0.46 43 0.08 64 3 262K 3.0 0.66 47 0.71 3.1 3.03 69 0.63 128 3 2,097K 3.0 6.59 89 13.47 3.2 24.61 108 10.27 Set-up times for MLR preconditioners are higher Bear in mind the ultimate target architecture [SIMD...] 29 SIAM-CSE March 1, 2013

Symmetric indefinite cases c > 0 in u cu; i.e., shifted by si. 2D case: s = 0.01, 3D case: s = 0.05 MLR + GMRES(40) compared to ILDLT + GMRES(40) 2-D problems: #lev= 4, rank= 5, 7, 7 3-D problems: #lev= 5, rank= 5, 7, 7 ILDLT failed for most cases Difficulties in MLR: #lev cannot be large, [no convergence] inefficient factorization at the last level (memory, CPU time)

Grid ILDLT-GMRES MLR-GMRES fill p-t its i-t fill p-t its i-t 256 2 6.5 0.16 F 6.0 0.39 84 0.30 512 2 8.4 1.25 F 8.2 2.24 246 6.03 1024 2 10.3 10.09 F 9.0 15.05 F 32 2 64 5.6 0.25 61 0.38 5.4 0.98 62 0.22 64 3 7.0 1.33 F 6.6 6.43 224 5.43 128 3 8.8 15.35 F 6.5 28.08 F 31 SIAM-CSE March 1, 2013

General symmetric matrices - Test matrices MATRIX N NNZ SPD DESCRIPTION Andrews/Andrews 60,000 760,154 yes computer graphics pb. Williams/cant 62,451 4,007,383 yes FEM cantilever UTEP/Dubcova2 65,025 1,030,225 yes 2-D/3-D PDE pb. Rothberg/cfd1 70,656 1,825,580 yes CFD pb. Schmid/thermal1 82,654 574,458 yes thermal pb. Rothberg/cfd2 123,440 3,085,406 yes CFD pb. Schmid/thermal2 1,228,045 8,580,313 yes thermal pb. Cote/vibrobox 12,328 301,700 no vibroacoustic pb. Cunningham/qa8fk 66,127 1,660,579 no 3-D acoustics pb. Koutsovasilis/F2 71,505 5,294,285 no structural pb. 32 SIAM-CSE March 1, 2013

Generalization of MLR via DD DD: PartGraphRecursive from METIS balancing with diagonals higher ranks used in two problems (cant and vibrobox) Show SPD cases first then non-spd 33 SIAM-CSE March 1, 2013

MATRIX ICT/ILDLT MLR-CG/GMRES fill p-t its i-t k lev fill p-t its i-t Andrews 2.6 0.44 32 0.16 2 6 2.3 1.38 27 0.08 cant 4.3 2.47 F 19.01 10 5 4.3 7.89 253 5.30 Dubcova2 1.4 0.14 42 0.21 4 4 1.5 0.60 47 0.09 cfd1 2.8 0.56 314 3.42 5 5 2.3 3.61 244 1.45 thermal1 3.1 0.15 108 0.51 2 5 3.2 0.69 109 0.33 cfd2 3.6 1.14 F 12.27 5 4 3.1 4.70 312 4.70 thermal2 5.3 4.11 148 20.45 5 5 5.4 15.15 178 14.96 34 SIAM-CSE March 1, 2013

MATRIX ICT/ILDLT MLR-CG/GMRES fill p-t its i-t k lev fill p-t its i-t vibrobox 3.3 0.19 F 1.06 10 4 3.0 0.45 183 0.22 qa8fk 1.8 0.58 56 0.60 2 8 1.6 2.33 75 0.36 F2 2.3 1.37 F 13.94 5 5 2.5 4.17 371 7.29 35 SIAM-CSE March 1, 2013

Conclusion Promising approach Many more avenues to explore: Nonsymmetric case, Implementation on GPUS, Storage for 3D case... 36 SIAM-CSE March 1, 2013