Preconditioning Techniques for Large Linear Systems Part III: General-Purpose Algebraic Preconditioners

Preconditioning Techniques for Large Linear Systems Part III: General-Purpose Algebraic Preconditioners Michele Benzi Department of Mathematics and Computer Science Emory University Atlanta, Georgia, USA Scuola di Dottorato di Ricerca in Scienze Matematiche Dipartimento di Matematica Università degli Studi di Padova 1

Outline 1 Introduction 2 Generalities about preconditioning 3 Basic concepts of algebraic preconditioning 4 Incomplete factorizations 5 Sparse approximate inverses 6 IF via approximate inverses 7 Balanced Incomplete Factorization (BIF) 8 Conclusions 2

Preconditioned iterative methods Solving large linear systems by Krylov-type methods Ax = b 4

Preconditioned iterative methods Solving large linear systems by Krylov-type methods Ax = b Preconditioning may be viewed as a transformation: M 1 Ax = M 1 b, or AM 1 y = b, x = M 1 y 4

Preconditioned iterative methods Solving large linear systems by Krylov-type methods Ax = b Preconditioning may be viewed as a transformation: M 1 Ax = M 1 b, or AM 1 y = b, x = M 1 y Examples: Matrix Splittings (block Jacobi, Gauss-Seidel, SSOR); Incomplete Factorizations; Sparse Approximate Inverses; AMG... preconditioner M (or M 1 ) should be cheap, fast to compute, and result in rapid convergence of the preconditioned iterative method 4

Preconditioned iterative methods Structure of this lecture: 5

Preconditioned iterative methods Structure of this lecture: 1 Brief discussion of algebraic vs. problem-specific preconditioning 5

Preconditioned iterative methods Structure of this lecture: 1 Brief discussion of algebraic vs. problem-specific preconditioning 2 Description of guiding principles behind algebraic preconditioning (IF and SAI). Robustness problems of standard techniques 3 Some recent approaches which exploit info on matrix inverse 5

A quote In ending this book with the subject of preconditioners, we find ourselves at the philosophical center of the scientific computing of the future... Nothing will be more central to computational science in the next century than the art of transforming a problem that appears intractable into another whose solution can be approximated rapidly. For Krylov subspace matrix iterations, this is preconditioning. From N. L. Trefethen and D. Bau, III, Numerical Linear Algebra, SIAM, 1997. 7

Algebraic vs. Problem-Specific Preconditioning Algebraic preconditioners only use information extracted from the input matrix A, usually supplemented by some user-provided tuning parameters, like drop tolerances or limits on the amount of fill-in allowed. Main examples include: Preconditioners based on classical (block) splittings A = M N Incomplete factorizations: M = LŪ A Approximate inverse preconditioners: G = M 1 A 1 Algebraic Multi-Grid (AMG). Hybrids obtained by combining some of the above Such preconditioners are good candidates for inclusion in general-purpose software packages. Although they may not be optimal for almost any problem, they are widely applicable and have proven to be reasonably robust in countless applications. 8

Algebraic vs. Problem-Specific Preconditioning Discretization of a continuous problem (a system of PDEs, an integral equation, etc.) leads to a sequence of linear systems A n x n = b n where A n is n n and n as the discretization is refined (that is, as h 0 ). Definition: A preconditioner is optimal if it results in a rate of convergence of the preconditioned iteration that is asymptotically constant as the problem size increases, and if the cost of each preconditioned iteration scales linearly in the size of the problem. For integral equations, the scaling of each iteration may be O(n log n) or something like that. 9

Algebraic vs. Problem-Specific Preconditioning In contrast, problem-specific preconditioners, which are designed to solve a narrow class of problems, are often optimal. These methods make extensive use of the developer s knowledge of the application at hand including information about the physics, the geometry, and the particular discretization technique used. These preconditioners are usually not suitable for other types of problems, so their range of applicability is limited. Many PDE-based (or physics-based) preconditioners belong to this class. An example is Diffusion Synthetic Acceleration (DSA) in radiation transport. 10

Algebraic vs. Problem-Specific Preconditioning The two approaches, algebraic and problem-specific, are not necessarily mutually exclusive similar to direct vs. iterative methods. 11

Algebraic vs. Problem-Specific Preconditioning The two approaches, algebraic and problem-specific, are not necessarily mutually exclusive similar to direct vs. iterative methods. Most problem-specific preconditioners use algebraic ones as building blocks, e.g., to solve or to approximate subproblems arising within the overall preconditioning strategy. Some algebraic preconditioners are flexible enough that they can be tailored to specific applications. 11

Implicit vs. explicit preconditioners An implicit, or direct, preconditioner is an approximation of the input matrix: M A. 13

Implicit vs. explicit preconditioners An implicit, or direct, preconditioner is an approximation of the input matrix: M A. An explicit, or inverse, preconditioner is an approximation of the inverse of the input matrix: G = M 1 A 1. This is motivated by the observation that even though A 1 is a dense matrix, many of its entries are negligibly small. Examples of implicit preconditioners include classical splittings, incomplete factorizations, block and multilevel variants. 13

Implicit vs. explicit preconditioners Application of an implicit preconditioner within a Krylov method (like CG or GMRES) requires solving one or more linear systems, often with triangular or block triangular matrices. In contrast, application of an explicit preconditioner requires one or more matrix-vector products. Explicit preconditioners are easier to parallelize. Generally speaking, however, the construction of an explicit preconditioner tends to be more costly than an implicit one. This is to be expected, since A (or its action) is known but A 1 is not. 14

Incomplete Factorization (IF) methods When a sparse matrix is factored by Gaussian elimination, fill-in usually takes place. This means that the triangular factors L and U of the coefficient matrix A are considerably less sparse than A. Even though sparsity-preserving reordering techniques can be used to reduce fill-in, sparse direct methods are not considered viable for solving very large linear systems such as those arising from the discretization of three-dimensional boundary value problems, due to time and space constraints. 16

Incomplete Factorization (IF) methods Incomplete factorization algorithms differ in the rules that govern the dropping of fill-in in the incomplete factors. Fill-in can be discarded based on several different criteria, such as position, value, or a combination of the two. Letting n = {1,2,...,n}, one can fix a subset S n n of positions in the matrix, usually including the main diagonal and all (i, j) such that a ij 0, and allow fill-in in the LU factors only in positions which are in S. 17

Incomplete Factorizations (IF) methods Very simple patterns for cheap / cache-efficient preconditioners? 18

Incomplete Factorizations (IF) methods Very simple patterns for cheap / cache-efficient preconditioners? Example: banded pattern: BCSSTK38, n = 8032, nnz = 181, 746; SPD (small structural analysis problem from Boeing). bandwidth (full) PCG its 1 426 3 821 5 648 9 1638 15 792 1011 105 1311 56 1511 nc 3111 35 4111 18 18

Incomplete Factorization (IF) methods Notice that the incomplete factorization may fail due to division by zero or near-zero (this is usually referred to as a pivot breakdown), even if A admits an LU factorization without pivoting. Partial pivoting can help, but it is costly and does not always suffice in the incomplete case. If S coincides with the set of positions which are nonzero in A, we obtain the no-fill ILU factorization, or ILU(0). For SPD matrices the same concept applies to the Cholesky factorization A = LL T, resulting in the no-fill IC factorization, or IC(0). 19

Incomplete Factorization (IF) methods The no-fill ILU and IC preconditioners are very simple to implement, their computation is inexpensive, and they are reasonably effective for significant problems, such as low-order discretizations of scalar elliptic PDEs leading to M-matrices or to diagonally dominant ones. No pivot breakdown can occur in these cases (Meĳerink & van der Vorst, 1977; Manteuffel, 1980). However, for more difficult and realistic problems the no-fill factorizations result in too crude an approximation of A, and more sophisticated preconditioners, which allow some fill-in in the incomplete factors, are needed. For instance, this is the case for highly nonsymmetric and indefinite matrices such as those arising in many CFD applications. 20

Incomplete Factorization (IF) methods A hierarchy of ILU preconditioners may be obtained based on the levels of fill-in concept. A level of fill is attributed to each matrix entry that occurs in the incomplete factorization process. Fill-ins are dropped based on the value of the level of fill. The formal definition is as follows. The initial level of fill of a matrix entry a ij is defined to be { 0, if a ij 0, or i = j lev ij = otherwise. Each time this element is modified by the ILU process, its level of fill must be updated according to lev ij = min{lev ij,lev ik + lev kj + 1}. 21

Example Level-based incomplete LU factorizations ILU(l) 22

Example Level-based incomplete LU factorizations ILU(l) Motivated by decay in factors of diagonally dominant matrices 22

Example Level-based incomplete LU factorizations ILU(l) Motivated by decay in factors of diagonally dominant matrices Structure of incomplete factors can be predicted using matrix graph 22

Example Level-based incomplete LU factorizations ILU(l) Motivated by decay in factors of diagonally dominant matrices Structure of incomplete factors can be predicted using matrix graph 0 5 10 15 20 25 30 35 40 45 50 0 10 20 30 40 50 nz = 217 ILU(0) 22

Numerical Example Fast symbolic costruction (Hysom & Pothen, SISC 2001) 23

Numerical Example Fast symbolic costruction (Hysom & Pothen, SISC 2001) But, typically expensive to apply even for modest number of levels 23

Numerical Example Fast symbolic costruction (Hysom & Pothen, SISC 2001) But, typically expensive to apply even for modest number of levels Example: Matrix ENGINE, n = 143,571, nnz = 2,424,822; SPD. levels size prec PCG its. 0 2,424,822 523 1 4,458,588 300 2 7,595,466 199 3 12,128,289 115 4 18,078,603 87 5 25,474,380 54 6 34,153,746 45 7 43,861,328 46 8 54,276,063 36 23

Preprocessing incomplete factorizations Preprocessing originally designed for direct solvers often very useful to improve robustness of ILU preconditioners: 24

Preprocessing incomplete factorizations Preprocessing originally designed for direct solvers often very useful to improve robustness of ILU preconditioners: Symmetric reorderings (RCM, MD, ND, etc.) Static pivoting : nonsymmetric permutations and scalings aimed at increasing diagonal dominance (Duff & Koster, SIMAX 1999, 2001; B., Haws & T uma, SISC 2000; Saad, SISC 2005; Mayer, SISC 2008) Extension to symmetric indefinite problems (Duff & Pralet, SIMAX 2005; Hagemann & Schenk, SISC 2006) 24

Example (cont.) Preprocessing: matrix is reordered with Multiple Minimum Degree, a fill-reducing ordering. 25

Example (cont.) Preprocessing: matrix is reordered with Multiple Minimum Degree, a fill-reducing ordering. Matrix ENGINE, n = 143,571, nnz = 2,424,822, MMD ordering 25

Example (cont.) Preprocessing: matrix is reordered with Multiple Minimum Degree, a fill-reducing ordering. Matrix ENGINE, n = 143,571, nnz = 2,424,822, MMD ordering levels size its size its 0 2,424,822 523 2,424,822 439 1 4,458,588 300 4,394,040 214 2 7,595,466 199 6,509,826 159 3 12,128,289 115 8,859,522 96 4 18,078,603 87 11,292,927 66 5 25,474,380 54 13,664,157 49 6 34,153,746 45 15,891,321 34 7 43,861,328 46 nc 8 54,276,063 36 19,590,303 18 25

The use of drop tolerances In many cases, an efficient preconditioner can be obtained from an incomplete factorization where new fill-ins are accepted or discarded on the basis of their size. In this way, only fill-ins that contribute significantly to the quality of the preconditioner are stored and used. A drop tolerance is a positive number τ which is used in a dropping criterion. An absolute dropping strategy can be used, whereby new fill-ins are accepted only if greater than τ in absolute value. This criterion may work poorly if the matrix is badly scaled, in which case it is better to use a relative drop tolerance. 26

The use of drop tolerances A drawback of this approach is that it is difficult to choose a good value of the drop tolerance: usually, this is done by trial-and-error for a few sample matrices from a given application, until a satisfactory value of τ is found. In many cases, good results are obtained for values of τ in the range 10 4-10 2, but the optimal value is strongly problem-dependent. Another difficulty is that it is impossible to predict the amount of storage that will be needed to store the incomplete LU factors. An efficient, predictable algorithm is obtained by limiting the number of nonzeros allowed in each row of the triangular factors. Saad (1994) has proposed the following dual threshold strategy: 27

The use of drop tolerances A variant of this approach allows in each row of the incomplete factors p nonzeros in addition to the positions that were already nonzeros in the original matrix A. This makes sense for irregular problems in which the nonzeros in A are not distributed uniformly. The resulting preconditioner, denoted by ILUT(τ, p), is quite powerful. If it fails on a problem for a given choice of the parameters τ and p, it will often succeed by taking a smaller value of τ and/or a larger value of p. The corresponding incomplete Cholesky preconditioner for SPD matrices, denoted ICT, can also be defined. 28

Example IC(0)/ICT may fail and simple diagonal scaling work! 29

Example IC(0)/ICT may fail and simple diagonal scaling work! Matrix LDOOR (structural analysis of car door), n = 952, 203, nnz = 23,737,339. 29

Example IC(0)/ICT may fail and simple diagonal scaling work! Matrix LDOOR (structural analysis of car door), n = 952, 203, nnz = 23,737,339. precond / precond. size PCG its Jacobi / 952,203 810 IC(0) / 23,737,339 > 1000 ICT / 23,838,704 > 1000 ICT / 24,614,381 > 1000 ICT / 26,167,321 > 1000 ICT / 30,047,027 > 1000 ICT / 37,809,756 > 1000 29

Stability considerations ILU preconditioners attempt to make the residual matrix R := A M small in some norm. However, this does not always result in good preconditioners. 30

Stability considerations ILU preconditioners attempt to make the residual matrix R := A M small in some norm. However, this does not always result in good preconditioners. As observed by several authors (Elman, Saad,...), a more meaningful approximation measure is based on the size of the error matrix E := I AM 1 Approximate inverse preconditioners attempt to make E small, but this may require a huge number of nonzeros in the preconditioner (unless the entries of A 1 exhibit fast off-diagonal decay). 30

Stability considerations Example (B., Szyld & van Duin, SISC 1999): System Ax = b is a discretization of a convection-dominated, convection-diffusion equation. Solver: Bi-CGSTAB. Orderings: lexicographic and MMD. Let N 1 := A LŪ F and N 2 := I A( LŪ) 1 F. ILU(0) Lexicogr. MMD N 1 4.06 10 1 4.53 10 0 N 2 3.26 10 6 2.00 10 2 Its nc 59 ILUT(0.01,5) Lexicogr. MMD N 1 1.78 10 1 7.39 10 1 N 2 2.79 10 1 5.81 10 6 Its 11 nc 31

Permuting large entries of A to the main diagonal 0 0 500 500 1000 1000 1500 1500 2000 2000 2500 2500 3000 3000 3500 3500 0 500 1000 1500 2000 2500 3000 3500 nz = 25407 0 500 1000 1500 2000 2500 3000 3500 nz = 25407 Jacobian from Navier-Stokes equations (original and permuted with MC64 + RCM). After preprocessing, ILUT with Bi-CGSTAB converges in 24 iterations. No convergence on original system. 32

Sparse approximate inverses Idea: directly approximate the inverse with a sparse matrix G A 1, then preconditioner application only needs mat-vecs with G. 34

Sparse approximate inverses Idea: directly approximate the inverse with a sparse matrix G A 1, then preconditioner application only needs mat-vecs with G. Mostly motivated by parallel processing; also, less prone to instabilities than ILU, and easy to update when solving a sequence of linear systems. Also useful for constructing robust smoothers for multigrid, and for other purposes like approximating Schur complements. By now, a large body of literature exists (100 s of papers since the 1990s). Successfully used in numerous applications, including solution of dense linear systems from BEM in electromagnetics, acoustics, and elastodynamics problems 34

Sparse approximate inverses Main approaches: sparse approximate inverses (SAIs) can be factored or unfactored. 35

Sparse approximate inverses Main approaches: sparse approximate inverses (SAIs) can be factored or unfactored. Factored forms are of the type G = ZW where, for instance, Z U 1 and W L 1. 35

Sparse approximate inverses Main approaches: sparse approximate inverses (SAIs) can be factored or unfactored. Factored forms are of the type G = ZW where, for instance, Z U 1 and W L 1. Factored forms are especially useful if A is SPD. In this case W = Z T and the approximate inverse G = ZZ T is guaranteed to be SPD. This allows for the use of the conjugate gradient (CG) method. 35