Parallel Numerics, WT 2016/ Iterative Methods for Sparse Linear Systems of Equations. page 1 of 1

Similar documents
Chapter 7 Iterative Techniques in Matrix Algebra

Lecture 18 Classical Iterative Methods

The Conjugate Gradient Method

Lab 1: Iterative Methods for Solving Linear Systems

SOLVING SPARSE LINEAR SYSTEMS OF EQUATIONS. Chao Yang Computational Research Division Lawrence Berkeley National Laboratory Berkeley, CA, USA

Preface to the Second Edition. Preface to the First Edition

9.1 Preconditioned Krylov Subspace Methods

ITERATIVE METHODS BASED ON KRYLOV SUBSPACES

AMS526: Numerical Analysis I (Numerical Linear Algebra for Computational and Data Sciences)

Applied Mathematics 205. Unit V: Eigenvalue Problems. Lecturer: Dr. David Knezevic

Iterative methods for Linear System

Numerical Methods - Numerical Linear Algebra

6. Iterative Methods for Linear Systems. The stepwise approach to the solution...

CME342 Parallel Methods in Numerical Analysis. Matrix Computation: Iterative Methods II. Sparse Matrix-vector Multiplication.

AMS526: Numerical Analysis I (Numerical Linear Algebra) Lecture 23: GMRES and Other Krylov Subspace Methods; Preconditioning

Topics. The CG Algorithm Algorithmic Options CG s Two Main Convergence Theorems

The amount of work to construct each new guess from the previous one should be a small multiple of the number of nonzeros in A.

Iterative Methods for Solving A x = b

Conjugate gradient method. Descent method. Conjugate search direction. Conjugate Gradient Algorithm (294)

AMS526: Numerical Analysis I (Numerical Linear Algebra)

M.A. Botchev. September 5, 2014

Iterative methods for Linear System of Equations. Joint Advanced Student School (JASS-2009)

APPLIED NUMERICAL LINEAR ALGEBRA

Linear Solvers. Andrew Hazel

Stabilization and Acceleration of Algebraic Multigrid Method

6.4 Krylov Subspaces and Conjugate Gradients

Course Notes: Week 1

Preliminary/Qualifying Exam in Numerical Analysis (Math 502a) Spring 2012

Bindel, Fall 2016 Matrix Computations (CS 6210) Notes for

ITERATIVE METHODS FOR SPARSE LINEAR SYSTEMS

4.6 Iterative Solvers for Linear Systems

OUTLINE ffl CFD: elliptic pde's! Ax = b ffl Basic iterative methods ffl Krylov subspace methods ffl Preconditioning techniques: Iterative methods ILU

Iterative Methods for Sparse Linear Systems

4.8 Arnoldi Iteration, Krylov Subspaces and GMRES

Algebra C Numerical Linear Algebra Sample Exam Problems

Math 1080: Numerical Linear Algebra Chapter 4, Iterative Methods

Some definitions. Math 1080: Numerical Linear Algebra Chapter 5, Solving Ax = b by Optimization. A-inner product. Important facts

FEM and sparse linear system solving

CLASSICAL ITERATIVE METHODS

Contents. Preface... xi. Introduction...

Lecture 11: CMSC 878R/AMSC698R. Iterative Methods An introduction. Outline. Inverse, LU decomposition, Cholesky, SVD, etc.

Numerical Methods in Matrix Computations

Introduction. Math 1080: Numerical Linear Algebra Chapter 4, Iterative Methods. Example: First Order Richardson. Strategy

Notes on PCG for Sparse Linear Systems

Lecture Note 7: Iterative methods for solving linear systems. Xiaoqun Zhang Shanghai Jiao Tong University

Computational Economics and Finance

1 Extrapolation: A Hint of Things to Come

Classical iterative methods for linear systems

Computational Linear Algebra

Notes on Some Methods for Solving Linear Systems

Lecture 17: Iterative Methods and Sparse Linear Algebra

Lecture 8: Fast Linear Solvers (Part 7)

FEM and Sparse Linear System Solving

Numerical Methods I Non-Square and Sparse Linear Systems

AMS526: Numerical Analysis I (Numerical Linear Algebra)

Lecture 17 Methods for System of Linear Equations: Part 2. Songting Luo. Department of Mathematics Iowa State University

Iterative Methods and Multigrid

KRYLOV SUBSPACE ITERATION

Solving Sparse Linear Systems: Iterative methods

Solving Sparse Linear Systems: Iterative methods

Computational Linear Algebra

Solving Ax = b, an overview. Program

CAAM 454/554: Stationary Iterative Methods

Introduction to Iterative Solvers of Linear Systems

Computation of the mtx-vec product based on storage scheme on vector CPUs

Computational Linear Algebra

Iterative Methods and Multigrid

Iterative techniques in matrix algebra

Iterative Methods for Ax=b

AMS526: Numerical Analysis I (Numerical Linear Algebra)

BLAS: Basic Linear Algebra Subroutines Analysis of the Matrix-Vector-Product Analysis of Matrix-Matrix Product

Krylov Space Methods. Nonstationary sounds good. Radu Trîmbiţaş ( Babeş-Bolyai University) Krylov Space Methods 1 / 17

Algorithms that use the Arnoldi Basis

Introduction to Scientific Computing

EXAMPLES OF CLASSICAL ITERATIVE METHODS

Algebraic Multigrid as Solvers and as Preconditioner

Chapter 7. Iterative methods for large sparse linear systems. 7.1 Sparse matrix algebra. Large sparse matrices

Numerical behavior of inexact linear solvers

Theory of Iterative Methods

9. Iterative Methods for Large Linear Systems

Scientific Computing with Case Studies SIAM Press, Lecture Notes for Unit VII Sparse Matrix

Iterative Solution methods

The Conjugate Gradient Method

From Stationary Methods to Krylov Subspaces

Master Thesis Literature Study Presentation

PETROV-GALERKIN METHODS

Iterative Methods. Splitting Methods

Summary of Iterative Methods for Non-symmetric Linear Equations That Are Related to the Conjugate Gradient (CG) Method

CME 302: NUMERICAL LINEAR ALGEBRA FALL 2005/06 LECTURE 0

Notes for CS542G (Iterative Solvers for Linear Systems)

Iterative Methods for Linear Systems

Steady-State Optimization Lecture 1: A Brief Review on Numerical Linear Algebra Methods

EECS 275 Matrix Computation

Math 471 (Numerical methods) Chapter 3 (second half). System of equations

Here is an example of a block diagonal matrix with Jordan Blocks on the diagonal: J

A hybrid reordered Arnoldi method to accelerate PageRank computations

Dense Matrices for Biofluids Applications

Preface to Second Edition... vii. Preface to First Edition...

Boundary Value Problems - Solving 3-D Finite-Difference problems Jacob White

Some minimization problems

Transcription:

Parallel Numerics, WT 2016/2017 5 Iterative Methods for Sparse Linear Systems of Equations page 1 of 1

Contents 1 Introduction 1.1 Computer Science Aspects 1.2 Numerical Problems 1.3 Graphs 1.4 Loop Manipulations 2 Elementary Linear Algebra Problems 2.1 BLAS: Basic Linear Algebra Subroutines 2.2 Matrix-Vector Operations 2.3 Matrix-Matrix-Product 3 Linear Systems of Equations with Dense Matrices 3.1 Gaussian Elimination 3.2 Parallelization 3.3 QR-Decomposition with Householder matrices 4 Sparse Matrices 4.1 General Properties, Storage 4.2 Sparse Matrices and Graphs 4.3 Reordering 4.4 Gaussian Elimination for Sparse Matrices 5 Iterative Methods for Sparse Linear Systems of Equations 5.1 Stationary Methods 5.2 Nonstationary Methods 5.3 Preconditioning 6 Domain Decomposition 6.1 Overlapping Domain Decomposition 6.2 Non-overlapping Domain Decomposition 6.3 Schur Complements page 2 of 1

Disadvantages of direct methods (in parallel): strongly sequential may lead to dense matrices sparsity pattern changes, additional entries necessary indirect addressing storage computational effort page 3 of 1

Disadvantages of direct methods (in parallel): strongly sequential may lead to dense matrices sparsity pattern changes, additional entries necessary indirect addressing storage computational effort Iterative solver: choose initial guess = starting vector x (0), e.g., x (0) = 0 iteration function x (k+1) := Φ(x (k) ) page 3 of 1

Disadvantages of direct methods (in parallel): strongly sequential may lead to dense matrices sparsity pattern changes, additional entries necessary indirect addressing storage computational effort Iterative solver: choose initial guess = starting vector x (0), e.g., x (0) = 0 iteration function x (k+1) := Φ(x (k) ) Applied on solving a linear system: Main part of Φ should be a matrix-vector multiplication Ax (matrix-free!?) Easy to parallelize, no change in the pattern of A. x (k) k x = A 1 b Main problem: Fast convergence! page 3 of 1

5.1. Stationary Methods 5.1.1. Richardson Iteration page 4 of 1

Construct from Ax = b an iteration process: b = Ax = ( A I + I )x = x (I A)x x = b + (I A)x }{{} (artificial) splitting of A = b + Nx page 4 of 1

Construct from Ax = b an iteration process: b = Ax = ( A I + I )x = x (I A)x x = b + (I A)x }{{} (artificial) splitting of A = b + Nx Leads to equation x = Φ(x) with Φ(x) := b + Nx: start: x (0) ; x (k+1) := Φ(x (k) ) = b + Nx (k) = b + (I A)x (k) page 4 of 1

Richardson Iteration (cont.) start: x (0) ; x (k+1) := Φ(x (k) ) = b + Nx (k) = b + (I A)x (k) If x (k) is convergent, x (k) x, then x = Φ( x) = b + N x = b + (I A) x A x = b and therefore it holds x (k) x = x := A 1 b page 5 of 1

Richardson Iteration (cont.) start: x (0) ; x (k+1) := Φ(x (k) ) = b + Nx (k) = b + (I A)x (k) If x (k) is convergent, x (k) x, then x = Φ( x) = b + N x = b + (I A) x A x = b and therefore it holds x (k) x = x := A 1 b Residual-based formulation: x (k+1) = Φ(x (k) ) = b + (I A)x (k) = b + x (k) Ax (k) = x (k) + (b Ax (k) ) = x (k) + r(x (k) ) }{{} r(x) = residual page 5 of 1

Convergence Analysis via Neumann Series x (k) = b + Nx (k 1) = b + N(b + Nx (k 2) ) = b + Nb + N 2 x (k 2) =... = b + Nb + N 2 b + + N k 1 b + N k x (0) = = ( k 1 k 1 ) j=0 Nj b + N k x (0) = j=0 Nj b + N k x (0) Special case x (0) = 0: x (k) = ( k 1 j=0 Nj ) b page 6 of 1

Convergence Analysis via Neumann Series x (k) = b + Nx (k 1) = b + N(b + Nx (k 2) ) = b + Nb + N 2 x (k 2) =... = b + Nb + N 2 b + + N k 1 b + N k x (0) = = ( k 1 k 1 ) j=0 Nj b + N k x (0) = j=0 Nj b + N k x (0) Special case x (0) = 0: x (k) = ( k 1 j=0 Nj ) b x (k) span{b, Nb, N 2 b,..., N k 1 b} = span{b, Ab, A 2 b,..., A k 1 b} which is called the Krylov space to A and b. = K k (A, b) page 6 of 1

Convergence Analysis via Neumann Series x (k) = b + Nx (k 1) = b + N(b + Nx (k 2) ) = b + Nb + N 2 x (k 2) =... = b + Nb + N 2 b + + N k 1 b + N k x (0) = = ( k 1 k 1 ) j=0 Nj b + N k x (0) = j=0 Nj b + N k x (0) Special case x (0) = 0: x (k) = ( k 1 j=0 Nj ) b x (k) span{b, Nb, N 2 b,..., N k 1 b} = span{b, Ab, A 2 b,..., A k 1 b} which is called the Krylov space to A and b. For N < 1 holds: k 1 j=0 Nj = K k (A, b) j=0 Nj = (I N) 1 = (I (I A)) 1 = A 1 page 6 of 1

Convergence Analysis via Neumann Series (cont.) ( x (k) Nj) b = (I N) 1 b = A 1 b = x j=0 Richardson iteration is convergent for N < 1 or A I. page 7 of 1

Convergence Analysis via Neumann Series (cont.) ( x (k) Nj) b = (I N) 1 b = A 1 b = x j=0 Richardson iteration is convergent for N < 1 or A I. Error analysis for e (k) := x (k) x: e (k+1) = x (k+1) x = Φ(x (k) ) Φ( x) = (b + Nx (k) ) (b + N x) = = N(x (k) x) = Ne (k) e (k) N e (k 1) N 2 e (k 2) N k e (0) N < 1 N k k 0 e (k) k 0 page 7 of 1

Convergence Analysis via Neumann Series (cont.) ( x (k) Nj) b = (I N) 1 b = A 1 b = x j=0 Richardson iteration is convergent for N < 1 or A I. Error analysis for e (k) := x (k) x: e (k+1) = x (k+1) x = Φ(x (k) ) Φ( x) = (b + Nx (k) ) (b + N x) = = N(x (k) x) = Ne (k) e (k) N e (k 1) N 2 e (k 2) N k e (0) N < 1 N k k 0 e (k) k 0 Convergence, if ρ(n) = ρ(i A) < 1, where ρ is spectral radius ρ(n) = λ max = max( λ i ) (λ i is eigenvalue of N) i Eigenvalues of A have to be all in circle around 1 with radius 1. page 7 of 1

Splittings of A Convergence of Richardson only in very special cases! Try to improve the iteration for better convergence! Write A in form A := M N b = Ax = (M N)x = Mx Nx x = M 1 b + M 1 Nx = Φ(x) Φ(x) = M 1 b + M 1 Nx = M 1 b + M 1 (M A)x = = M 1 (b Ax) + x = x + M 1 r(x) page 8 of 1

Splittings of A Convergence of Richardson only in very special cases! Try to improve the iteration for better convergence! Write A in form A := M N b = Ax = (M N)x = Mx Nx x = M 1 b + M 1 Nx = Φ(x) Φ(x) = M 1 b + M 1 Nx = M 1 b + M 1 (M A)x = = M 1 (b Ax) + x = x + M 1 r(x) N should be such that Ny can be evaluated efficiently. M should be such that M 1 y can be evaluated efficiently. x (k+1) = x (k) + M 1 r (k) Iteration with splitting M N is equivalent to Richardson on M 1 Ax = M 1 b page 8 of 1

Convergence Iteration with splitting A = M N is convergent if ρ(m 1 N) = ρ(i M 1 A) < 1 For fast convergence it should hold M 1 A I M 1 A should be better conditioned than A itself page 9 of 1

Convergence Iteration with splitting A = M N is convergent if ρ(m 1 N) = ρ(i M 1 A) < 1 For fast convergence it should hold M 1 A I M 1 A should be better conditioned than A itself Such a matrix M is called a preconditioner for A. Is used in other iterative methods to accelerate convergence. Condition number: κ(a) = A 1 A, λ max λ min, or σ max σ min page 9 of 1

5.1.2. Jacobi (Diagonal) Splitting Choose A = M N = D (L + U) with D = diag(a) L the lower triangular part of A, and U the upper triangular part. x (k+1) = D 1 b + D 1 (L + U)x (k) = = D 1 b + D 1 (D A)x (k) = x (k) + D 1 r (k) page 10 of 1

5.1.3. Jacobi (Diagonal) Splitting Choose A = M N = D (L + U) with D = diag(a) L the lower triangular part of A, and U the upper triangular part. x (k+1) = D 1 b + D 1 (L + U)x (k) = = D 1 b + D 1 (D A)x (k) = x (k) + D 1 r (k) Convergent for A diag(a) or diagonal dominant matrices: ρ(d 1 N) = ρ(i D 1 A) < 1 page 10 of 1

Jacobi (Diagonal) Splitting (cont.) Iteration process written elementwise: x (k+1) = D 1 (b (A D)x (k) ) x (k+1) j a jj x (k+1) j = b j j 1 a j,mx (k) m m=1 = 1 b j a jj n n m=1,m j a j,mx (k) m m=j+1 a j,m x (k) m page 11 of 1

Jacobi (Diagonal) Splitting (cont.) Iteration process written elementwise: x (k+1) = D 1 (b (A D)x (k) ) x (k+1) j a jj x (k+1) j = b j j 1 a j,mx (k) m m=1 = 1 b j a jj n n m=1,m j a j,mx (k) m m=j+1 Damping or relaxation for improving convergence Idea: Iterative method as correction of last iterate in search direction. a j,m x (k) m page 11 of 1

Jacobi (Diagonal) Splitting (cont.) Iteration process written elementwise: x (k+1) = D 1 (b (A D)x (k) ) x (k+1) j a jj x (k+1) j = b j j 1 a j,mx (k) m m=1 = 1 b j a jj n n m=1,m j a j,mx (k) m m=j+1 Damping or relaxation for improving convergence Idea: Iterative method as correction of last iterate in search direction. Introduce step length for this correction step: a j,m x (k) x (k+1) = x (k) + D 1 r (k) x (k+1) = x (k) + ωd 1 r (k) with additional damping parameter ω. Damped Jacobi iteration: x (k+1) damped = (ω + 1 ω)x (k) + ωd 1 r (k) = ωx (k+1) + (1 ω)x (k) m page 11 of 1

Damped Jacobi Iteration x (k+1) = x (k) + ωd 1 r (k) = x (k) + ωd 1 (b Ax (k) ) = is convergent for =... = ωd 1 b + [(1 ω)i + ωd 1 (L + U)]x (k) ρ([(1 ω)i + ωd 1 (L + U)] ) < 1 }{{} ω 0 I Look for optimal ω with best convergence (add. degree of freedom). page 12 of 1

Parallelism in the Jacobi Iteration Jacobi method is easy to parallelize: only Ax and D 1 x. But often too slow convergence! Improvement: block Jacobi iteration page 13 of 1

5.1.4. Gauss-Seidel Iteration Always use newest information available! Jacobi iteration: a jj x (k+1) j j 1 = b j Gauss-Seidel iteration: a jj x (k+1) j m=1 j 1 = b j m=1 a j,m a j,m x (k) m }{{} already computed x (k+1) m }{{} already computed n m=j+1 n m=j+1 a j,m x (k) m a j,m x (k) m page 14 of 1

Gauss-Seidel Iteration (cont.) Compare dependancy graphs for general iterative algorithms. Here: x = f (x) = D 1 (b + (D A)x) = D 1 (b (L + U)x) to splitting A = (D L) U = M N x (k+1) = (D L) 1 b + (D L) 1 Ux (k) = = (D L) 1 b + (D L) 1 (D L A)x (k) = = x (k) + (D L) 1 r (k) Convergence depends on spectral radius ρ(i (D L) 1 A) < 1 page 15 of 1

Parallelism in the Gauss-Seidel Iteration Linear system in D L is easy to solve because D L is lower triangular but strongly sequential! Use red-black ordering or graph colouring for compromise: parallel convergence page 16 of 1

Successive Over Relaxation (SOR) Damping or relaxation: x (k+1) = x (k) +ω(d L) 1 r (k) = ω(d L) 1 b+[(1 ω)+ω(d L) 1 U]x (k) Convergence depends on spectral radius of iteration matrix (1 ω) + ω(d L) 1 U Parallelization of SOR == parallelization of GS page 17 of 1

Stationary Methods (in General) Can always be written in the two normal forms x (k+1) = c + Bx (k) with convergence depending on ρ(b) of iteration matrix and x (k+1) = x (k) + Fr (k) with preconditioner F, B = I FA page 18 of 1

Stationary Methods (in General) Can always be written in the two normal forms x (k+1) = c + Bx (k) with convergence depending on ρ(b) of iteration matrix and x (k+1) = x (k) + Fr (k) with preconditioner F, B = I FA For x (0) = 0: x (k+1) K k (B, c), which is the Krylov space with respect to matrix B and vector c. Slow convergence (but good smoothing properties! multigrid) page 18 of 1

MATLAB Example clear; n=100;k=10;omega=1;stationary tridiag(.5, 1,.5): Jacobi norm cos(π/n) GS norm cos(π/n) 2 both < 1 convergence, but slow To improve convergence nonstationary methods (or multigrid) page 19 of 1

Chair of Informatics V SCCS Efficient Numerical Algorithms Parallel & HPC High-dimensional numerics (sparse grids) Fast iterative solvers (multi-level methods, preconditioners, eigenvalue solvers) Uncertainty Quantification Space-filling curves Numerical linear algebra Numerical algorithms for image processing HW-aware numerical programming Fields of application in simulation CFD (incl. fluid-structure interaction) Plasma physics Molecular dynamics Quantum chemistry Further info www5.in.tum.de Feel free to come around and ask for thesis topics! page 20 of 1

5.2. Nonstationary Methods 5.2.1. Gradient Method Consider A = A T > 0 (A SPD) Function Φ(x) = 1 2 x T Ax b T x n-dim. paraboloid R n R Gradient Φ(x) = Ax b Position with Φ(x) = 0 is exactly minimum of paraboloid page 21 of 1

5.3. Nonstationary Methods 5.3.1. Gradient Method Consider A = A T > 0 (A SPD) Function Φ(x) = 1 2 x T Ax b T x n-dim. paraboloid R n R Gradient Φ(x) = Ax b Position with Φ(x) = 0 is exactly minimum of paraboloid Instead of solving Ax = b consider min x Φ(x) Local descent direction in y: Φ(x) y is minimum for y = Φ(x) page 21 of 1

Gradient Method (cont.) Optimization: start with x (0) x (k+1) := x (k) + α k d (k) with search direction d (k) and step size α k. In view of previous results the optimal (local) search direction is Φ(x (k) ) =: d (k) page 22 of 1

Gradient Method (cont.) Optimization: start with x (0) x (k+1) := x (k) + α k d (k) with search direction d (k) and step size α k. In view of previous results the optimal (local) search direction is To define α k : Φ(x (k) ) =: d (k) min g(α) := min(φ(x (k) + α(b Ax (k) ))) α α ( ) 1 = min α 2 (x (k) + αd (k) ) T A(x (k) + αd (k) ) b T (x (k) + αd (k) ) ( 1 = min α 2 α2 d (k) T Ad (k) αd (k)t d (k) + 1 ) 2 x (k)t Ax (k) x (k)t b α k = d (k)t d (k) d (k)t Ad (k) d (k) = Φ(x (k) ) = b Ax (k) page 22 of 1

Gradient Method (cont. 2) x (k+1) = x (k) + b Ax (k) 2 2 (b Ax (k) ) T A(b Ax (k) ) (b Ax (k) ) Method of steepest descent. page 23 of 1

Gradient Method (cont. 2) x (k+1) = x (k) + b Ax (k) 2 2 (b Ax (k) ) T A(b Ax (k) ) (b Ax (k) ) Method of steepest descent. Disadvantage: Distorted contour lines. Slow convergence (zig zag path) Local descent direction is not globally optimal page 23 of 1

Analysis of the Gradient Method Definition A-norm: x A := x T Ax Consider error: x x 2 A = x A 1 b 2 A = (x T b T A 1 )A(x A 1 b) = x T Ax 2b T x + b T A 1 b = 2Φ(x) + b T A 1 b page 24 of 1

Analysis of the Gradient Method Definition A-norm: Consider error: x A := x T Ax x x 2 A = x A 1 b 2 A = (x T b T A 1 )A(x A 1 b) = x T Ax 2b T x + b T A 1 b = 2Φ(x) + b T A 1 b Therefore, minimizing Φ is equivalent to minimizing the error in the A-norm! More detailed analysis reveals: ( x (k+1) x 2 A 1 1 ) x (k) x 2 A κ(a) Therefore, for κ(a) 1 very slow convergence! page 24 of 1

5.3.2. The Conjugate Gradient Method Improving descent direction being globally optimal. x (k+1) := x (k) + α k p (k) with search direction not being negative gradient, but projection of gradient that is A-conjugate to all previous search directions: p (k) Ap (j) p (k) A p (j) for all j < k or or p (k)t Ap (j) = 0 for j < k page 25 of 1

5.3.3. The Conjugate Gradient Method Improving descent direction being globally optimal. x (k+1) := x (k) + α k p (k) with search direction not being negative gradient, but projection of gradient that is A-conjugate to all previous search directions: p (k) Ap (j) p (k) A p (j) for all j < k or or p (k)t Ap (j) = 0 for j < k We choose new search direction as component of last residual that is A-conjugate to all previous search directions. α k again by 1-dim. minimization as before (for chosen p (k) ) page 25 of 1

The Conjugate Gradient Algorithm p (0) = r (0) = b Ax (0) for k = 1, 2,... do α (k) = r (k),r (k) p (k),ap (k) x (k+1) = x (k) α (k) p (k) r (k+1) = r (k) + α (k) Ap (k) if r (k+1) 2 2 ɛ then break β (k) = r (k+1),r (k+1) r (k),r (k) p (k+1) = r (k+1) + β (k) p (k) page 26 of 1

Properties of Conjugate Gradients It holds p (j)t Ap (k) = 0 = r (j)t r (k) for j k and span(p (1),..., p (j) ) = span(r (0),..., r (j 1) ) = = span(r (0), Ar (0),..., A j 1 r (0) ) = K j (A, r (0) ) Especially for x (0) = 0 it holds K j (A, r (0) ) = span(b, Ab,..., A j 1 b) page 27 of 1

Properties of Conjugate Gradients It holds p (j)t Ap (k) = 0 = r (j)t r (k) for j k and span(p (1),..., p (j) ) = span(r (0),..., r (j 1) ) = = span(r (0), Ar (0),..., A j 1 r (0) ) = K j (A, r (0) ) Especially for x (0) = 0 it holds K j (A, r (0) ) = span(b, Ab,..., A j 1 b) x (k) is best approximate solution to Ax = b in subspace K k (A, r (0) ). For x (0) = 0 : x (k) K k (A, b) Error: x (k) x A = min x K k (A,b) x x A Cheap 1-dim. minimization gives optimal k-dim. solution for free! page 27 of 1

Properties of Conjugate Gradients (cont.) Consequence: After n steps K n (A, b) = R n and therefore x (n) = A 1 b is solution in exact arithmetic. Unfortunately, this is not true in floating point arithmetic. Furthermore, O(n) iteration steps would be too costly: costs: #iterations matrix-vector-product page 28 of 1

Properties of Conjugate Gradients (cont.) Consequence: After n steps K n (A, b) = R n and therefore x (n) = A 1 b is solution in exact arithmetic. Unfortunately, this is not true in floating point arithmetic. Furthermore, O(n) iteration steps would be too costly: costs: #iterations matrix-vector-product Matrix-vector-product easy in parallel. But, how to get fast convergence and reduce #iterations? page 28 of 1

Error Estimation (x (0) = 0) e (k) A = x (k) x A = min x K k (A,b) x x A = = min α 0,...,α k 1 = min P (k 1) (x) = min P (k 1) (x) = min P (k 1) (x) = min α j(a j b) x j=0 = A k 1 P (k 1) (A)b x = A P (k 1) (A)A x x = A (P (k 1) (A)A I)( x x (0) ) = A Q (k) (x),q (k) (0)=1 Q (k) (A)e (0) A for polynomial Q (k) (x) of degree k with Q (k) (0) = 1. page 29 of 1

Error Estimation Matrix A has orthonormal basis of eigenvectors u j, j = 1,..., n, eigenvalues λ j It holds Au j = λ j u j, j = 1,..., n, and u T j u k = 0 for j k and 1 for j = k Start error in ONB: e (0) = n j=1 ρ ju j page 30 of 1

Error Estimation Matrix A has orthonormal basis of eigenvectors u j, j = 1,..., n, eigenvalues λ j It holds Au j = λ j u j, j = 1,..., n, and u T j u k = 0 for j k and 1 for j = k Start error in ONB: e (0) = n ρ ju j j=1 n n e (k) A = min Q (k) (0)=1 Q(k) (A) ρ j u j = min ρ j Q (k) (A)u j = j=1 Q (k) (0)=1 j=1 A A n { = min ρ j Q (k) (λ j )u j min max Q (k) (λ j ) } n ρ j u j Q (k) (0)=1 j=1 Q (k) (0)=1 j=1,...,n = j=1 { A } A e = min max Q (k) (0)=1 j=1,...,n Q(k) (λ j ) (0) A page 30 of 1

Error Estimates By choosing polynomials with Q (k) (0) = 1, we can derive error estimates for the error after the kth step: Choose, e.g. Q (k) (x) := 1 2 x λ max + λ min k page 31 of 1

Error Estimates By choosing polynomials with Q (k) (0) = 1, we can derive error estimates for the error after the kth step: Choose, e.g. Q (k) (x) := 1 2 x λ max + λ min k e (k) A max j=1,...,n Q(k) (λ j ) e (0) A = 1 2λ k max λ max + λ min e (0) A = ( ) k ( ) k λmax λ min κ(a) 1 e (0) A = e (0) A λ max + λ min κ(a) + 1 page 31 of 1

Better Estimates Choose normalized Chebyshev polynomials T n (x) = cos(n arccos(x)) e (k) A 1 T k ( κ(a)+1 κ(a) 1 ) e (0) A 2 ( κ(a) 1 κ(a) + 1 ) k e (0) A page 32 of 1

Better Estimates Choose normalized Chebyshev polynomials T n (x) = cos(n arccos(x)) e (k) A 1 T k ( κ(a)+1 κ(a) 1 ) e (0) A 2 ( κ(a) 1 κ(a) + 1 ) k e (0) A For clustered eigenvalues choose special polynomial, e.g. assume that A has only two eigenvalues λ 1 and λ 2 : Q (2) (x) := (λ 1 x)(λ 2 x) λ 1 λ 2 e (2) A max Q (2) (λ j ) e (0) A = 0 j=1,2 Convergence of CG after two steps! page 32 of 1

Outliers/Cluster Assume the matrix has an eigenvalue λ 1 > 1 and all other eigenvalues are contained in an ɛ-neighborhood of 1: λ λ 1 : λ 1 < ɛ e (2) A λ 1 <ɛ Q (2) (x) := (λ 1 x)(1 x) λ 1 max (λ 1 λ)(1 λ) λ 1 e(0) A (λ 1 1 + ɛ)ɛ = O(ɛ) λ 1 Very good approximation of CG after only two steps! Important: small number of outliers combined with cluster. page 33 of 1

Conjugate Gradients Summary To get fast convergence and reduce the number of iterations: find preconditioner M, such that M 1 Ax = M 1 b with clustered eigenvalues. Conjugate gradients (CG) is always the method of choice for symmetric positive definite A (in general). To improve convergence, include preconditioning (PCG). CG has two important properties: optimal and cheap. page 34 of 1

Parallel Conjugate Gradients Algorithm page 35 of 1

Non-Blocking Collective Operations Use to hide communication! Allows overlap of numerical computations and communcations. In MPI-1/MPI-2 only possible for point-to-point communication: MPI Isend and MPI Irecv. Additional libraries necessary for collective operations! Example: LibNBC (non-blocking collectives) Are included in new MPI-3 standard. page 36 of 1

Example: Pseudocode for Nonblock. Reduct. MPI_Request req; int sbuf1[size], rbuf1[size], buf2[size]; // compute sbuf1 compute(sbuf1, SIZE); // start non-blocking allreduce of subf1 // computation and communication overlap MPI_Iallreduce(sbuf1,rbuf1,SIZE,MPI_INT,MPI_SUM,MPI_COMM, &req); // compute buf2 (independent of buf1) compute(buf2, SIZE); // synchronization MPI_WAIT(&req, &stat); // use data in rbuf1; final computation evaluate(rbuf1, buf2, SIZE); page 37 of 1

Iter. Methods for general (nonsymmetric) A: BiCGSTAB Applicable if little memory at hand and A T not available. Computational costs per iteration similar to BiCG and CGS. Alternative for CGS that avoids irregular convergence patterns of CGS maintaining similar convergence speed. Less loss of accuracy in the updated residual. page 38 of 1

Iter. Methods for general (nonsymmetric) A: GMRES Leads to smallest residual for fixed number of iteration steps, but these steps become increasingly expensive. To limit increasing storage requirments and work per iteration step, restarting is necessary. When to do so depends on A and b; it requires skill and experience. page 39 of 1

Iter. Methods for general (nonsymmetric) A: GMRES Leads to smallest residual for fixed number of iteration steps, but these steps become increasingly expensive. To limit increasing storage requirments and work per iteration step, restarting is necessary. When to do so depends on A and b; it requires skill and experience. Requires only matrix-vector products with the coeff. matrix. Number of inner products grows linearly with iteration number (up to restart point). Implementation based on Gram-Schmidt inner products independent only one synchronization point. Using modified Gram-Schmidt one synchronization point per inner product. We consider GMRES in the following. page 39 of 1

5.3.4. GMRES Iterative solution method for general A Consider small subspace U m and determine optimal approximate solution for Ax = b in U m. Hence x is of the form x := U m y min x U m Ax b 2 = min y A(U m y) b 2 Can be solved by normal equations: (U T ma T AU m )y = U T ma T b page 40 of 1

5.3.5. GMRES Iterative solution method for general A Consider small subspace U m and determine optimal approximate solution for Ax = b in U m. Hence x is of the form x := U m y min x U m Ax b 2 = min y A(U m y) b 2 Can be solved by normal equations: (U T ma T AU m )y = U T ma T b Try to find sequence of good subspaces U 1 U 2 U 3... such that iteratively we can update the optimal solutions x 1 x 2 x 3... A 1 b using mainly matrix-vector products. page 40 of 1

GMRES: Subspace What subspace for fast convergence and easy computations? page 41 of 1

GMRES: Subspace What subspace for fast convergence and easy computations? U m := K m (A, b) = span(b, Ab,..., A m 1 b) (Krylov space) Problem: b, Ab, A 2 b,... is bad basis for this subspace! So we need a first step to compute a good basis for U m : page 41 of 1

GMRES: Subspace What subspace for fast convergence and easy computations? U m := K m (A, b) = span(b, Ab,..., A m 1 b) (Krylov space) Problem: b, Ab, A 2 b,... is bad basis for this subspace! So we need a first step to compute a good basis for U m : Start with u 1 := b b 2 and do for j = 2 : m: j 1 j 1 ũ j := Au j 1 (uk T Au j 1)u k = Au j 1 h k,j 1 u k k=1 k=1 u j := ũ j ũ j 2 = ũj h j,j 1 which is the standard orthogonalization method (Arnoldi method) page 41 of 1

Matrix Form of Arnoldi ONB U m = span(b, Ab,..., A m 1 b) = span(u 1, u 2,..., u m ) (ONB) Write this orthogonalization method in matrix form j 1 j Au j 1 = h k,j 1 u k + ũ j = h k,j 1 u k k=1 k=1 page 42 of 1

Matrix Form of Arnoldi ONB U m = span(b, Ab,..., A m 1 b) = span(u 1, u 2,..., u m ) (ONB) Write this orthogonalization method in matrix form j 1 j Au j 1 = h k,j 1 u k + ũ j = h k,j 1 u k k=1 k=1 AU m = A(u 1... u m ) = (u 1... u m+1 ) H m+1,m = U m+1 Hm+1,m h 11 h 1m. h.. 21. H m+1,m =. 0......... hmm 0 0 h m+1,m page 42 of 1

GMRES: Minimization This leads to minimization problem min Ax b = min AU m y b x U m y Um+1 = min Hm+1,m y b u 1 y Um+1 = min ( H m+1,m y b e 1 ) y = min Hm+1,m y b e 1 y Because U m+1 is part of an orthogonal matrix. page 43 of 1