A short course on: Preconditioned Krylov subspace methods. Yousef Saad University of Minnesota Dept. of Computer Science and Engineering

Similar documents
AMS526: Numerical Analysis I (Numerical Linear Algebra for Computational and Data Sciences)

9.1 Preconditioned Krylov Subspace Methods

ITERATIVE METHODS FOR SPARSE LINEAR SYSTEMS

CSCI 8314 Spring 2017 SPARSE MATRIX COMPUTATIONS

Preface to the Second Edition. Preface to the First Edition

Lecture 18 Classical Iterative Methods

SOLVING SPARSE LINEAR SYSTEMS OF EQUATIONS. Chao Yang Computational Research Division Lawrence Berkeley National Laboratory Berkeley, CA, USA

Lecture 8: Fast Linear Solvers (Part 7)

Bindel, Fall 2016 Matrix Computations (CS 6210) Notes for

Sparse Linear Systems. Iterative Methods for Sparse Linear Systems. Motivation for Studying Sparse Linear Systems. Partial Differential Equations

Scientific Computing with Case Studies SIAM Press, Lecture Notes for Unit VII Sparse Matrix

CME342 Parallel Methods in Numerical Analysis. Matrix Computation: Iterative Methods II. Sparse Matrix-vector Multiplication.

Solving Ax = b, an overview. Program

Iterative Solution methods

Chapter 7 Iterative Techniques in Matrix Algebra

Solving linear systems (6 lectures)

Multilevel low-rank approximation preconditioners Yousef Saad Department of Computer Science and Engineering University of Minnesota

Background. Background. C. T. Kelley NC State University tim C. T. Kelley Background NCSU, Spring / 58

INTRODUCTION TO SPARSE MATRICES

Boundary Value Problems - Solving 3-D Finite-Difference problems Jacob White

Scientific Computing WS 2018/2019. Lecture 9. Jürgen Fuhrmann Lecture 9 Slide 1

Scientific Computing: An Introductory Survey

The Conjugate Gradient Method

Preconditioning techniques to accelerate the convergence of the iterative solution methods

Numerical Methods I Non-Square and Sparse Linear Systems

1. Fast Iterative Solvers of SLE

Sparse Matrices and Iterative Methods

EXAMPLES OF CLASSICAL ITERATIVE METHODS

Classical iterative methods for linear systems

Direct solution methods for sparse matrices. p. 1/49

5.1 Banded Storage. u = temperature. The five-point difference operator. uh (x, y + h) 2u h (x, y)+u h (x, y h) uh (x + h, y) 2u h (x, y)+u h (x h, y)

Lecture 17: Iterative Methods and Sparse Linear Algebra

Numerical Solution Techniques in Mechanical and Aerospace Engineering

Parallel Numerics, WT 2016/ Iterative Methods for Sparse Linear Systems of Equations. page 1 of 1

9. Iterative Methods for Large Linear Systems

Preconditioning Techniques for Large Linear Systems Part III: General-Purpose Algebraic Preconditioners

Lecture Note 7: Iterative methods for solving linear systems. Xiaoqun Zhang Shanghai Jiao Tong University

Lecture 9 Approximations of Laplace s Equation, Finite Element Method. Mathématiques appliquées (MATH0504-1) B. Dewals, C.

An advanced ILU preconditioner for the incompressible Navier-Stokes equations

OUTLINE ffl CFD: elliptic pde's! Ax = b ffl Basic iterative methods ffl Krylov subspace methods ffl Preconditioning techniques: Iterative methods ILU

Iterative Methods for Solving A x = b

AMS526: Numerical Analysis I (Numerical Linear Algebra)

6. Iterative Methods for Linear Systems. The stepwise approach to the solution...

Math Introduction to Numerical Analysis - Class Notes. Fernando Guevara Vasquez. Version Date: January 17, 2012.

Computational Methods. Systems of Linear Equations

CLASSICAL ITERATIVE METHODS

FINE-GRAINED PARALLEL INCOMPLETE LU FACTORIZATION

An Efficient Low Memory Implicit DG Algorithm for Time Dependent Problems

Scientific Computing

FINE-GRAINED PARALLEL INCOMPLETE LU FACTORIZATION

Kasetsart University Workshop. Multigrid methods: An introduction

FEM and Sparse Linear System Solving

Linear Solvers. Andrew Hazel

Modelling and implementation of algorithms in applied mathematics using MPI

Iterative Methods and Multigrid

Numerical Methods in Matrix Computations

Iterative Methods. Splitting Methods

A greedy strategy for coarse-grid selection

Today s class. Linear Algebraic Equations LU Decomposition. Numerical Methods, Fall 2011 Lecture 8. Prof. Jinbo Bi CSE, UConn

c 2015 Society for Industrial and Applied Mathematics

Algebraic Multigrid as Solvers and as Preconditioner

COURSE Iterative methods for solving linear systems

Solving Large Nonlinear Sparse Systems

Iterative Methods and Multigrid

J.I. Aliaga 1 M. Bollhöfer 2 A.F. Martín 1 E.S. Quintana-Ortí 1. March, 2009

Numerical Linear Algebra

Program Lecture 2. Numerical Linear Algebra. Gaussian elimination (2) Gaussian elimination. Decompositions, numerical aspects

A short course on: Preconditioned Krylov subspace methods. Yousef Saad University of Minnesota Dept. of Computer Science and Engineering

Computational Economics and Finance

A Linear Multigrid Preconditioner for the solution of the Navier-Stokes Equations using a Discontinuous Galerkin Discretization. Laslo Tibor Diosady

Stabilization and Acceleration of Algebraic Multigrid Method

A Numerical Study of Some Parallel Algebraic Preconditioners

The amount of work to construct each new guess from the previous one should be a small multiple of the number of nonzeros in A.

Linear Algebra. Brigitte Bidégaray-Fesquet. MSIAM, September Univ. Grenoble Alpes, Laboratoire Jean Kuntzmann, Grenoble.

Multilevel Low-Rank Preconditioners Yousef Saad Department of Computer Science and Engineering University of Minnesota. Modelling 2014 June 2,2014

CS 323: Numerical Analysis and Computing

The flexible incomplete LU preconditioner for large nonsymmetric linear systems. Takatoshi Nakamura Takashi Nodera

A simple FEM solver and its data parallelism

Lecture 9: Numerical Linear Algebra Primer (February 11st)

Algebra C Numerical Linear Algebra Sample Exam Problems

Solving Symmetric Indefinite Systems with Symmetric Positive Definite Preconditioners

Implementation of the Deflated Pre-conditioned Conjugate Gradient Method applied to the Poisson equation related to bubbly flow on the GPU

Lecture 11: CMSC 878R/AMSC698R. Iterative Methods An introduction. Outline. Inverse, LU decomposition, Cholesky, SVD, etc.

Jordan Journal of Mathematics and Statistics (JJMS) 5(3), 2012, pp A NEW ITERATIVE METHOD FOR SOLVING LINEAR SYSTEMS OF EQUATIONS

AMS Mathematics Subject Classification : 65F10,65F50. Key words and phrases: ILUS factorization, preconditioning, Schur complement, 1.

A Method for Constructing Diagonally Dominant Preconditioners based on Jacobi Rotations

AMS526: Numerical Analysis I (Numerical Linear Algebra) Lecture 23: GMRES and Other Krylov Subspace Methods; Preconditioning

Review of matrices. Let m, n IN. A rectangle of numbers written like A =

Using an Auction Algorithm in AMG based on Maximum Weighted Matching in Matrix Graphs

Cache Oblivious Stencil Computations

An Introduction to Algebraic Multigrid (AMG) Algorithms Derrick Cerwinsky and Craig C. Douglas 1/84

Concepts. 3.1 Numerical Analysis. Chapter Numerical Analysis Scheme

Math 5630: Iterative Methods for Systems of Equations Hung Phan, UMass Lowell March 22, 2018

Notes for CS542G (Iterative Solvers for Linear Systems)

Motivation: Sparse matrices and numerical PDE's

Parallel Iterative Methods for Sparse Linear Systems. H. Martin Bücker Lehrstuhl für Hochleistungsrechnen

Math/Phys/Engr 428, Math 529/Phys 528 Numerical Methods - Summer Homework 3 Due: Tuesday, July 3, 2018

Implicit Solution of Viscous Aerodynamic Flows using the Discontinuous Galerkin Method

Introduction and Stationary Iterative Methods

Numerical Mathematics

Transcription:

A short course on: Preconditioned Krylov subspace methods Yousef Saad University of Minnesota Dept. of Computer Science and Engineering Universite du Littoral, Jan 19-30, 2005

Outline Part 1 Introd., discretization of PDEs Sparse matrices and sparsity Basic iterative methods (Relaxation..) Part 2 Projection methods Krylov subspace methods Part 3 Preconditioned iterations Preconditioning techniques Parallel implementations Part 4 Eigenvalue problems Applications Calais - February 7, 2005 2

INTRODUCTION TO SPARSE MATRICES

Typical Problem: Physical Model Nonlinear PDEs Discretization Linearization (Newton) Sequence of Sparse Linear Systems Ax = b Calais - February 7, 2005 4

What are sparse matrices? Usual definition:..matrices that allow special techniques to take advantage of the large number of zero elements and the structure. (J. Wilkinson) A few applications of sparse matrices: Structural Engineering, Reservoir simulation, Electrical Networks, optimization problems,... Goals: Much less storage and work than dense computations. Observation: A 1 is usually dense, but L and U in the LU factorization may be reasonably sparse (if a good technique is used). Calais - February 7, 2005 5

Nonzero patterns of a few sparse matrices ARC130: Unsymmetric matrix from laser problem. a.r.curtis, oct 1974 SHERMAN5: fully implicit black oil simulator 16 by 23 by 3 grid, 3 unk

PORES3: Unsymmetric MATRIX FROM PORES BP_1000: UNSYMMETRIC BASIS FROM LP PROBLEM BP Calais - February 7, 2005 6

Two types of matrices: structured (e.g. Sherman5) and unstructured (e.g. BP 1000) Main goal of Sparse Matrix Techniques: To perform standard matrix computations economically i.e., without storing the zeros of the matrix. Example: To add two square dense matrices of size n requires O(n 2 ) operations. To add two sparse matrices A and B requires O(nnz(A) + nnz(b)) where nnz(x) = number of nonzero elements of a matrix X. For typical Finite Element /Finite difference matrices, number of nonzero elements is O(n). Calais - February 7, 2005 8

Sparse Matrices in Typical Applications The matrices PORES3 and SHERMAN5 are from Oil Reservoir Simulation. Typically: 3 unknowns per mesh point Initially Oil reservoir simulators often used rectangular grids. Oil companies = first commercial buyers of (vector) supercomputers New simulators getting more sophisticated larger and more dif- cult problems to solve. Number of unknowns per node higher (e.g. combustion models). Finer, more accurate, 3-D models. A naive but representative problem: 100 100 100 grid + 10 unknowns per node N 10 7, and nnz 7 10 8. Calais - February 7, 2005 9

DISCRETIZATION OF PDES - INTRODUCTION

Discretization of PDEs Common Partial Differential Equation (PDE) : 2 u x 2 1 + 2 u x 2 2 = f, for x = where Ω = bounded, open domain in R 2. x 1 x 2 in Ω n Requires boundary conditions: x 2 Ω x 1 Γ Dirichlet: Neumann: Cauchy: u(x) = φ(x) u n (x) = 0 u (x) + α(x)u(x) = γ(x) n Calais - February 7, 2005 11

Discretization of PDEs - Basic approximations Formulas derives from Taylor series expansion: u(x + h) = u(x) + h du dx + h2 d 2 u 2 dx + h3 d 3 u 2 6 dx + h4 d 4 u 3 24 dx 4(ξ +), Simplest scheme: forward difference du dx = u(x + h) u(x) h h 2 d 2 u(x) dx 2 + O(h 2 ) Centered differences for second derivative: d 2 u(x) dx 2 = where ξ ξ ξ +. u(x + h) 2u(x) + u(x h) h 2 u(x + h) u(x) h h2 d 4 u(ξ), 12 dx 4 Calais - February 7, 2005 12

Difference Schemes for the Laplacean Using centered differences for both the 2 x 2 1 sizes h 1 and h 2 : and 2 x 2 2 terms - with mesh u(x) u(x 1 + h 1, x 2 ) 2u(x 1, x 2 ) + u(x h 1, x 2 ) If h 1 = h 2 = h then h 2 1 u(x 1, x 2 + h 2 ) 2u(x 1, x 2 ) + u(x 1, x 2 h 2 ). u(x) 1 h 2 [u(x 1 + h, x 2 ) + u(x 1 h, x 2 ) + u(x 1, x 2 + h) h 2 2 + u(x 1, x 2 h) 4u(x 1, x 2 )], (1) + Calais - February 7, 2005 13

Another approximation: exploit the four points u(x 1 ± h, x 2 ± h) [part (b) of Figure] (a) 1 (b) 1 1 1-4 1-4 1 1 1 Five-point stencils (a) the standard stencil, (b) skewed stencil. (c) (d) 1 1 1 1 4 1 1-8 1 4-20 4 1 1 1 1 4 1 Two nine-point centered difference stencils Calais - February 7, 2005 14

Error for the 5-point approximation (1) is h 2 12 4 u + 4 u 4 x 1 4 + O(h 3 ). x 2 The two other schemes (c) and (d) obtained by combining the standard and skewed stencils are second order accurate. But (d) is sixth order for harmonic functions, i.e., functions whose Laplacean is zero. Calais - February 7, 2005 15

Finite Differences for 1-D Problems Consider the one-dimensional equation, u (x) = f(x) for x (0, 1) (2) u(0) = u(1) = 0. (3) discretize [0,1] uniformly: x i = i h, i = 0,..., n + 1 where h = 1/(n + 1) Dirichlet boundary conditions u(x 0 ) = u(x n+1 ) = 0 With centered differences the equation at the point ξ i, u i 1 + 2u i u i+1 = h 2 f i, the unknowns u j approximates u(ξ j ) and f i = f(ξ i ). Calais - February 7, 2005 16

Ax = f where A = 1 h 2 2 1 1 2 1 1 2 1 1 2 1 1 2 1 1 2. Calais - February 7, 2005 17

Finite Differences for 2-D Problems Consider this simple problem, 2 u x 2 1 + 2 u x 2 2 = f in Ω (4) u = 0 on Γ (5) where Ω is now the rectangle (0, l 1 ) (0, l 2 ) and Γ its boundary. Discretize uniformly : x 1,i = i h 1, i = 0,..., n 1 + 1 x 2,j = j h 2, j = 0,..., n 2 + 1 where h 1 = l 1 n 1 + 1 h 2 = l 2 n 2 + 1. Calais - February 7, 2005 18

The resulting matrix has the following block structure: A = 1 h 2 B I I B I I B with B = 4 1 1 4 1 1 4 1 1 4. Matrix for 7 5 finite difference mesh Calais - February 7, 2005 19

DATA STRUCTURES

Data Structures. General Observations The use of a proper data structures is critical to achieving good performance. Many data structures; sometimes unnecessary variants. These seem more useful for iterative methods than for direct methods. Basic linear algebra kernels (e.g., matrix-vector products) depend on data structures. Calais - February 7, 2005 21

Some Common Data Structures DNS Dense BND Linpack Banded COO Coordinate CSR Compressed Sparse Row CSC Compressed Sparse Column MSR Modified CSR ELL DIA BSR SSK NSK JAD Ellpack-Itpack Diagonal Block Sparse Row Symmetric Skyline Nonsymmetric Skyline Jagged Diagonal Calais - February 7, 2005 22

The coordinate format (COO) A = 1. 0. 0. 2. 0. 3. 4. 0. 5. 0. 6. 0. 7. 8. 9. 0. 0. 10. 11. 0. 0. 0. 0. 0. 12. AA = JR = JC = 12. 9. 7. 5. 1. 2. 11. 3. 6. 4. 8. 10. 5 3 3 2 1 1 4 2 3 2 3 4 5 5 3 4 1 4 4 1 1 2 4 3 Calais - February 7, 2005 23

Compressed Sparse Row (CSR) format A = 1. 0. 0. 2. 0. 3. 4. 0. 5. 0. 6. 0. 7. 8. 9. 0. 0. 10. 11. 0. 0. 0. 0. 0. 12. AA = 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. JA = 1 4 1 2 4 1 3 4 5 3 4 5 IA = 1 3 6 10 12 13 Variants/ related formats: Compressed Sparse Column format, Modi ed Sparse Row format (MSR). Calais - February 7, 2005 24

The Diagonal (DIA) format A = 1. 0. 2. 0. 0. 3. 4. 0. 5. 0. 0. 6. 7. 0. 8. 0. 0. 9. 10. 0. 0. 0. 0. 11. 12. * 1. 2. DA = 3. 4. 5. 6. 7. 8. 9. 10. * 11 12. * IOFF = -1 0 2 Calais - February 7, 2005 25

The Ellpack-Itpack format A = 1. 0. 2. 0. 0. 3. 4. 0. 5. 0. 0. 6. 7. 0. 8. 0. 0. 9. 10. 0. 0. 0. 0. 11. 12. 1. 2. 0. 1 3 1 3. 4. 5. 1 2 4 AC = 6. 7. 8. JC = 2 3 5 9. 10. 0. 3 4 4 11 12. 0. 4 5 5 Calais - February 7, 2005 26

BLOCK MATRICES A = 1. 2. 0. 0. 3. 4. 5. 6. 0. 0. 7. 8. 0. 0. 9. 10. 11. 12. 0. 0. 13. 14. 15. 16. 17. 18. 0. 0. 20. 21. 22. 23. 0. 0. 24. 25. AA = 1. 3. 9. 11. 17. 20. 5. 7. 15. 13. 22. 24. 2. 4. 10. 12. 18. 21. 6. 8. 14. 16. 23. 25. JA = 1 5 3 5 1 5 IA = 1 3 5 7 Calais - February 7, 2005 27

Each column in AA holds a 2 x 2 block. JA(k) = col. index of (1,1) entries of k-th block. AA may be declared as AA(2,2,6) Block formats are important in many applications.. Also valuable: block structure with variable block size. Calais - February 7, 2005 28

Can also store the blocks row-wise in AA. 1. 5. 2. 6. 3. 7. 4. 8. AA = 9. 15. 10. 14. 11. 13. 12. 16. 17. 22. 18. 23. 20. 24. 21. 25. JA = 1 5 3 5 1 5 IA = 1 3 5 7 Each row of AA holds a 2 x 2 block (Drawback). J A, IA same as before. AA can be declared as AA(6,2,2) If elements of blocks are accessed at same time: scheme 1 better. If elements of similar positions in different blocks are accessed at same time then scheme 2 better. Calais - February 7, 2005 29

Relation Performance-Data Structures Performance can vary significantly for the same operation. Storage Scheme/ Kernel CRAY-2 Alliant Compressed Sparse Row 8.5 7.96 Compressed Sparse Column 14.5 0.6 Ellpack-Itpack 31 6.72 Diagonal 66.5 9.53 Diagonal Unrolled 99 13.92 MFLPS for matrix-vector ope s on CRAY-2 & Alliant FX-8 [From sparse matrix benchmark, (Wijshoff & Saad, 1989)] Best algorithm for one machine may be worst for another. Calais - February 7, 2005 30

BASIC LINEAR ALGEBRA KERNELS Philosophy: select a numnber of common linear algebar kernels (e.g. dot-products) and develop highly tuned libraries for them on each machine. (A) BLAS1: vector operations. (B) BLAS2: Matrix - Vector type operations (C) BLAS3: Matrix - Matrix type operations (Blocking) Calais - February 7, 2005 31

Graph Representations of Sparse Matrices Graph theory is a fundamental tool in sparse matrix techniques. Graph G = (V, E) of an n n matrix A defined by Vertices V = {1, 2,..., N}. Edges E = {(i, j) a ij 0}. Graph is undirected if matrix has symmetric structure: a ij 0 iff a ji 0. Calais - February 7, 2005 32

1 2 x x x x x x x x x 4 3 x x x x x x x x x x x x 1 2 4 3 Calais - February 7, 2005 33

Example: Adjacency graph of: A =. Example: For any matrix A, what is the graph of A 2? [interpret in terms of paths in the graph of A] Calais - February 7, 2005 34

DIRECT VERSUS ITERATIVE METHODS Current consensus: For two-dimensional problems direct solvers are often preferred. For three-dimensional problems, iterative methods are advantageous. For problems with a large degree of freedom per grid point, say 10, situation similar to 3-D problems. Difficulty: No robust black-box iterative solvers. Calais - February 7, 2005 35

Reorderings and graphs Let π = {i 1,, i n } a permutation A π, = { a π(i),j }i,j=1,...,n = matrix A with its i-th row replaced by row number π(i). A,π = matrix A with its j-th column replaced by column π(j). Define P π = I π, = Permutation matrix Then: (1) Each row (column) of P π consists of zeros and exactly one 1 (2) A π, = P π A (3) P π Pπ T = I (4) A,π = APπ T Calais - February 7, 2005 36

Consider now: A = A π,π = P π AP T π Element in position (i, j) in matrix A is exactly element in position (π(i), π(j)) in A. (a ij = a π(i),π(j)) (i, j) E A (π(i), π(j)) E A General picture : i j new labels π(i) π(j) old labels Calais - February 7, 2005 37

Example A 9 9 arrow matrix and its adjacency graph. 7 6 8 5 1 9 4 2 3 Calais - February 7, 2005 38

Graph and matrix after permuting the nodes in reverse order. 3 4 2 5 9 1 6 8 7 Calais - February 7, 2005 39

Cuthill-McKee, reverse Cuthill-McKee orderings A class of reordering techniques proceeds by levels in the graph. Related to Breadth First Search (BFS) traversal in graph theory. Idea of BFS is to visit the nodes by levels. Level 0 = level of starting node. Start with a node, visit its neighbors, then the (unmarked) neighbors of its neighbors, etc... Calais - February 7, 2005 40

Example: A B H I *--------*-----*-------* / / / / BFS from node A: / Level 0: A C*--------* D / Level 1: B, C; \ / Level 2: E, D, H; \ * K Level 3: I, K, E, F, G, H. F \ E *-------*----* G \ / \ / \ / * H Calais - February 7, 2005 41

Implementation using levels Algorithm BF S(G, v) by level sets Initialize S = {v}, seen = 1; Mark v; While seen < n Do S new = ; For each node v in S do For each unmarked w in adj(v) do Add w to S new ; Mark w; seen + +; S := S new Calais - February 7, 2005 42

Cuthill McKee ordering Algorithm proceeds by levels and is identical with BFS, except that within each level nodes are odered by increasing degree. Example A B D G Level Nodes Deg. Order 0 A 2 A E 1 B, C 4, 3 C, B C 2 D, E, F 3, 4, 2 F, D, E F 3 G 2 G Calais - February 7, 2005 43

Reverse Cuthill McKee ordering Observation: The Cuthill - Mc Kee ordering has a tendency to create small arrow matrices ordered upward (the wrong way). Idea: Take the reverse ordering: Reverse Cuthill M Kee ordering. Calais - February 7, 2005 44

Orderings used in direct solution methods Two broad types of orderings used: Minimal degree ordering + many variations Nested dissection ordering + many variations Minimal degree ordering is easiest to describe: At each step of GE, select next node to eliminate, as the node v of smallest degree. After eliminating node v, update degrees and repeat. Calais - February 7, 2005 45

Nested Dissection Easily described by using recursivity and by exploiting separators. Main step: separate the graph in three parts, two of which have no coupling between each other. The third set has couplings with vertices from both of the first sets and is referred to as a sepator. Key idea: dissect the graph; take the subgraphs and dissect them recursively. Calais - February 7, 2005 46

Nested dissection ordering and corresponding reordered matrix 3 5 1 7 4 2 6 Calais - February 7, 2005 47

For regular simple q q meshes, it can be shown that fill-in is of order q 2 log q and computational cost of factorization is O(q 3 ). A useful result [Rose-Tarjan fill-path theorem] A Fill-path is a path between two vertices i and j in the graph of A such that all vertices in the path, except the end points i and j, are numbered less than i and j. THEOREM There is a fill-in in entry (i, j) at the completion of the Gaussian elimination process if and only if, there exists a fill-path between i and j. Separating a graph finding 3 sets of vertices: V 1, V 2, S such that V = V 1 V 2 S and V 1 and V 2 have no couplings. Labeling nodes of S last prevents fill-ins between nodes of V 1 and V 2. Calais - February 7, 2005 48

Multicoloring General technique that can be exploited in many different ways to introduce parallelism generally of order N. Constitutes one of the most successful techniques (when applicable) on (vector) supercomputers. Simple example: Red-Black ordering.

18 9 19 10 20 6 16 7 17 8 13 4 14 5 15 1 11 2 12 3 Calais - February 7, 2005 50

Corresponding matrix Observe: L-U solves (or SOR sweeps) will require only diagonal scalings + matrix-vector products with matrices of size N/2. Calais - February 7, 2005 51

Solution of Red-black systems Red-Black ordering leads to equations of the form: D 1 and D 2 are diagonal. D 1 E F D 2 x 1 x 2 = b 1 b 2 Question: Method 1: How to solve such a system? use ILU(0) on this block system. O(N) Parallelism. Often, the number of iterations is higher than with the natural ordering. But still competitive for easy problems. Could use a more accurate preconditioner [e.g., ILUT]. However: O(N) parallelism lost. Calais - February 7, 2005 52

Method 2: SOR/SSOR(k) preconditioner. No such difficulty with SOR/SSOR... A more accurate preconditioner means more SOR/ SSOR steps. [SOR(k) k steps of SOR]. Can perform quite well. 2.0 1.5 GMRES(10) Iteration times versus k for SOR(k) preconditioned GMRES C P U T i m e 1.0 GMRES(20) 0.50 0. 0. 5.0 10. 15. 20. 25. 30. Number of SOR steps Calais - February 7, 2005 53

Method 3: Eliminate the red unknowns from the system Reduced system: (D 2 F D 1 1 E)x 2 = b 2 F D 1 1 b 1. Again a sparse linear system with half as many unknowns. Can often be efficiently solved with only diagonal preconditioning. Question: Can we recursively color the reduced system and get a second-level reduced system? Answer: Yes but need to generalize red-black ordering. Calais - February 7, 2005 54

How to generalize Red-Black ordering? Answer: Multicoloring & independent sets A greedy multicoloring technique: Initially assign color number zero (uncolored) to every node. Choose an order in which to traverse the nodes. Scan all nodes in the chosen order and at every node i do Color(i) = min{k 0 k Color(j), j Adj (i)} Adj(i) = set of nearest neighbors of i = {k a ik 0}. Calais - February 7, 2005 55

2 0 4 1 0 Calais - February 7, 2005 56

Independent Sets An independent set (IS) is a set of nodes that are not coupled by an equation. The set is maximal if all other nodes in the graph are coupled to a node of IS. If the unknowns of the IS are labeled first, then the matrix will have the form: B F E in which B is a diagonal matrix, and E, F, and C are sparse. Greedy algorithm: Scan all nodes in a certain order and at every node i do: if i is not colored color it Red and color all its neighbors Black. Independent set: set of red nodes. Complexity: O( E + V ). C Calais - February 7, 2005 57

BASIC RELAXATION METHODS

BASIC RELAXATION SCHEMES Relaxation schemes: based on the decomposition A = D E F - E D - F D = diag(a), E = strict lower part of A and F its strict upper part. Gauss-Seidel iteration for solving Ax = b: (D E)x (k+1) = F x (k) + b idea: correct the j-th component of the current approximate solution, j = 1, 2,..n, to zero the j th component of residual. Calais - February 7, 2005 59

Can also define a backward Gauss-Seidel Iteration: (D F )x (k+1) = Ex (k) + b and a Symmetric Gauss-Seidel Iteration: forward sweep followed by backward sweep. Over-relaxation is based on the decomposition: ωa = (D ωe) (ωf + (1 ω)d) successive overrelaxation, (SOR): (D ωe)x (k+1) = [ωf + (1 ω)d]x (k) + ωb Calais - February 7, 2005 60

Iteration matrices Jacobi, Gauss-Seidel, SOR, & SSOR iterations are of the form x (k+1) = Mx (k) + f M Jac = D 1 (E + F ) = I D 1 A M GS (A) = (D E) 1 F == I (D E) 1 A M SOR (A) = (D ωe) 1 (ωf + (1 ω)d) = I (ω 1 D E) 1 A M SSOR (A) = I (2ω 1 1)(ω 1 D F ) 1 D(ω 1 D E) 1 A = I ω(2ω 1)(D ωf ) 1 D(D ωe) 1 A Calais - February 7, 2005 61

General convergence result Consider the iteration: x (k+1) = Gx (k) + f (1) Assume that ρ(a) < 1. Then I G is non-singular and G has a fixed point. Iteration converges to a fixed point for any f and x (0). (2) If iteration converges for any f and x (0) then ρ(g) < 1. Example: Richardson s iteration x (k+1) = x (k) + α(b A (k) ) Assume Λ(A) R. When does the iteration converge? Jacobi and Gauss-Seidel converge for diagonal dominant A SOR converges for 0 < ω < 2 for SPD matrices Calais - February 7, 2005 62

An observation. Introduction to Preconditioning The iteration x (k+1) = Mx (k) + f is attempting to solve (I M)x = f. Since M is of the form M = I P 1 A this system can be rewritten as P 1 Ax = P 1 b where for SSOR, we have P SSOR = (D ωe)d 1 (D ωf ) referred to as the SSOR preconditioning matrix. In other words: Relaxation Scheme Preconditioned Fixed Point Iteration Calais - February 7, 2005 63