Introduction to Iterative Solvers of Linear Systems

Similar documents
The Conjugate Gradient Method

Here is an example of a block diagonal matrix with Jordan Blocks on the diagonal: J

Math Introduction to Numerical Analysis - Class Notes. Fernando Guevara Vasquez. Version Date: January 17, 2012.

Iterative methods for Linear System

Preliminary/Qualifying Exam in Numerical Analysis (Math 502a) Spring 2012

Applied Mathematics 205. Unit V: Eigenvalue Problems. Lecturer: Dr. David Knezevic

Classical iterative methods for linear systems

Eigenvalue and Eigenvector Problems

The amount of work to construct each new guess from the previous one should be a small multiple of the number of nonzeros in A.

Contribution of Wo¹niakowski, Strako²,... The conjugate gradient method in nite precision computa

Eigenvalue Problems CHAPTER 1 : PRELIMINARIES

JACOBI S ITERATION METHOD

The Lanczos and conjugate gradient algorithms

Lab 1: Iterative Methods for Solving Linear Systems

Key words. conjugate gradients, normwise backward error, incremental norm estimation.

Iterative methods for Linear System of Equations. Joint Advanced Student School (JASS-2009)

DIAGONALIZATION. In order to see the implications of this definition, let us consider the following example Example 1. Consider the matrix

9.1 Preconditioned Krylov Subspace Methods

Lecture 18 Classical Iterative Methods

Linear Algebra II Lecture 13

Conjugate Gradient algorithm. Storage: fixed, independent of number of steps.

Algebra C Numerical Linear Algebra Sample Exam Problems

Computational Linear Algebra

ALGEBRA QUALIFYING EXAM PROBLEMS LINEAR ALGEBRA

BASIC ALGORITHMS IN LINEAR ALGEBRA. Matrices and Applications of Gaussian Elimination. A 2 x. A T m x. A 1 x A T 1. A m x

Problem Set (T) If A is an m n matrix, B is an n p matrix and D is a p s matrix, then show

Chapter 7 Iterative Techniques in Matrix Algebra

Computational Linear Algebra

FEM and sparse linear system solving

Chapter 7. Canonical Forms. 7.1 Eigenvalues and Eigenvectors

I. Multiple Choice Questions (Answer any eight)

Spectral Theorem for Self-adjoint Linear Operators

LARGE SPARSE EIGENVALUE PROBLEMS. General Tools for Solving Large Eigen-Problems

M.A. Botchev. September 5, 2014

LARGE SPARSE EIGENVALUE PROBLEMS

Iterative Methods for Sparse Linear Systems

Ir O D = D = ( ) Section 2.6 Example 1. (Bottom of page 119) dim(v ) = dim(l(v, W )) = dim(v ) dim(f ) = dim(v )

Lecture 10b: iterative and direct linear solvers

Numerical Analysis: Solutions of System of. Linear Equation. Natasha S. Sharma, PhD

AMS526: Numerical Analysis I (Numerical Linear Algebra) Lecture 23: GMRES and Other Krylov Subspace Methods; Preconditioning

SOLVING SPARSE LINEAR SYSTEMS OF EQUATIONS. Chao Yang Computational Research Division Lawrence Berkeley National Laboratory Berkeley, CA, USA

Some minimization problems

Parallel Numerics, WT 2016/ Iterative Methods for Sparse Linear Systems of Equations. page 1 of 1

Introduction to Scientific Computing

Computational Linear Algebra

CAAM 454/554: Stationary Iterative Methods

Numerical Methods for Solving Large Scale Eigenvalue Problems

EXAM. Exam 1. Math 5316, Fall December 2, 2012

Theory of Iterative Methods

Math 108b: Notes on the Spectral Theorem

The Singular Value Decomposition

4.6 Iterative Solvers for Linear Systems

Numerical Methods - Numerical Linear Algebra

4.8 Arnoldi Iteration, Krylov Subspaces and GMRES

Linear Algebra: Matrix Eigenvalue Problems

Solving Sparse Linear Systems: Iterative methods

Solving Sparse Linear Systems: Iterative methods

EIGENVALUE PROBLEMS. Background on eigenvalues/ eigenvectors / decompositions. Perturbation analysis, condition numbers..

The German word eigen is cognate with the Old English word āgen, which became owen in Middle English and own in modern English.

LINEAR ALGEBRA 1, 2012-I PARTIAL EXAM 3 SOLUTIONS TO PRACTICE PROBLEMS

Last Time. Social Network Graphs Betweenness. Graph Laplacian. Girvan-Newman Algorithm. Spectral Bisection

University of Colorado at Denver Mathematics Department Applied Linear Algebra Preliminary Exam With Solutions 16 January 2009, 10:00 am 2:00 pm

EECS 275 Matrix Computation

9. Iterative Methods for Large Linear Systems

Properties of Matrices and Operations on Matrices

Math 577 Assignment 7

UCSD ECE269 Handout #18 Prof. Young-Han Kim Monday, March 19, Final Examination (Total: 130 points)

18.06 Problem Set 8 - Solutions Due Wednesday, 14 November 2007 at 4 pm in

1 Last time: least-squares problems

MATH 20F: LINEAR ALGEBRA LECTURE B00 (T. KEMP)

Symmetric and self-adjoint matrices

MATHEMATICS 217 NOTES

22m:033 Notes: 7.1 Diagonalization of Symmetric Matrices

Lecture Note 7: Iterative methods for solving linear systems. Xiaoqun Zhang Shanghai Jiao Tong University

Chap 3. Linear Algebra

Iterative techniques in matrix algebra

forms Christopher Engström November 14, 2014 MAA704: Matrix factorization and canonical forms Matrix properties Matrix factorization Canonical forms

EXAMPLES OF CLASSICAL ITERATIVE METHODS

Linear algebra and applications to graphs Part 1

Applied Mathematics 205. Unit V: Eigenvalue Problems. Lecturer: Dr. David Knezevic

Chapter 7. Iterative methods for large sparse linear systems. 7.1 Sparse matrix algebra. Large sparse matrices

Throughout these notes we assume V, W are finite dimensional inner product spaces over C.

Spectral radius, symmetric and positive matrices

Econ 204 Supplement to Section 3.6 Diagonalization and Quadratic Forms. 1 Diagonalization and Change of Basis

MATH 423 Linear Algebra II Lecture 33: Diagonalization of normal operators.

Linear Algebra Massoud Malek

Notes on Some Methods for Solving Linear Systems

LinGloss. A glossary of linear algebra

1 Inner Product and Orthogonality

Review of some mathematical tools

Lecture notes: Applied linear algebra Part 2. Version 1

ECS231 Handout Subspace projection methods for Solving Large-Scale Eigenvalue Problems. Part I: Review of basic theory of eigenvalue problems

1 Conjugate gradients

Jae Heon Yun and Yu Du Han

Math Homework 8 (selected problems)

linearly indepedent eigenvectors as the multiplicity of the root, but in general there may be no more than one. For further discussion, assume matrice

Summary of Iterative Methods for Non-symmetric Linear Equations That Are Related to the Conjugate Gradient (CG) Method

AMS526: Numerical Analysis I (Numerical Linear Algebra)

1 Linear Algebra Problems

MTH 102: Linear Algebra Department of Mathematics and Statistics Indian Institute of Technology - Kanpur. Problem Set

Transcription:

Introduction to Iterative Solvers of Linear Systems SFB Training Event January 2012 Prof. Dr. Andreas Frommer Typeset by Lukas Krämer, Simon-Wolfgang Mages and Rudolf Rödl

1 Classes of Matrices and their Properties A common problem in both mathematics and physics is the solution of linear systems Ax = b, (1) where A C n n and b C n are given and x C n is wanted. The matrix A is supposed to be nonsingular. In these notes the solution to the system will be referred to as x = A 1 b. Special classes of matrices lead to simplified problems. Let us first introduce some special classes of matrices, which have increasingly less nice properties. A matrix A is called hermitian if A = A. If A is hermtian then there exists a unitary matrix U C n n (U U = I) which transforms A to a diagonal matrix Λ R n n, A = UΛU. (2) A closer look on (2) reveals that the diagonal entries of Λ are the real eigenvalues λ i R, i = 1,..., n of A and the columns of U are the corresponding eigenvectors. This can more easily be seen by writing AU = UΛ instead of (2). Note that the λ i are not necessarily different for different i. A matrix A is called normal if A A = AA. Like hermitian matrices, normal matrices are unitary diagonalizable, i.e., A = UΛU, U U = I for some diagonal matrix Λ. Obviously every hermitian matrix is normal, but the eigenvalues of a normal matrix are not necessarily real numbers. A matrix A is called diagonalizable if there exists an invertible matrix V C n n which mediates the transformation of A into diagonal form, A = V ΛV 1, where Λ is a possibly complex valued diagonal matrix. Once more, the columns of V are eigenvectos of the matrix V, but they are not necessarily orthonormal. In the general case A can be brought to the form A = V JV 1 where J C n n is in Jordan canonical form, i.e., it is block diagonal J = diag(j i ), 1

with blocks of the form λ i 0...... 0. 1......... J i =. 0.................... 0 0... 0 1 λ i Again, the λ i are not necessarily different for different i. Note that a small perturbation of A would make it diagonalizable, meaning that A would have only Jordan blocks of size 1. In other words, a very tiny perturbation of A can lead to a change in the lower diagonal from 1 to 0. This suggest that we should not seek a numerical algorithm to compute Jordan canonical forms. 2 Grandpa s Methods Some iterative methods are known already for a long time. Examples are the Jacobi method and the Gauß-Seidel method. Both methods rely on splitting the matrix in (1) into the diagonal, lower triangular and upper triangular parts. Equating (1) row by row gives a ii x i + j i a ij x j = b i, i = 1,..., n. This leads to the Jacobi method, developed by Jacobi in 1850, where the (k +1)th iterate is computed as x (k+1) i = 1 b i a ij x (k) j, i = 1,..., n (3) a ii j i for k = 0, 1,.... Computing (3) for each i in the order 1, 2,..., n suggests using the new iterates that are already computed, i.e., x (k+1) j for j < i, which leads directly to the Gauß-Seidel method x (k+1) i = 1 b i a ij x (k+1) j a ij x (k) j. (4) a ii j<i j>i In the following, we show how the iterations (3) and (4) can be written in a more compact form. To this end, let D denote the diagonal part of the matrix A, L the lower triangular part of A, and U the upper triangular part, so that A = D L U. Additionally x (k), k N denotes the approximation to the solution x after k steps of the iteration. 2

Jacobi: The defining equation for the Jacobi iteration is in matrix notation Dx (k+1) = b + (L + U)x (k). Gauß-Seidel: The defining equation for the Gauß-Seidel iteration is in matrix notation (D L)x (k+1) = b + Ux (k). General splitting methods. splitting methods, If x solves the problem then Both methods belong to the more general class of iterative A = M N, Mx (k+1) = b + Nx (k). Mx = b + Nx and we find for the error e (k) = x x (k) after the k-th iteration e (k+1) = x x (k+1), Me (k+1) = Ne (k), e (k+1) = M 1 Ne (k). Theorem 1. The iteration Mx k+1 = Nx k + b converges for any x (0) C n if and only if ρ(m 1 N) < 1, where ρ(a) = max { λ : λ is eigenvalue of A} is the spectral radius of A. Proof. 1 We have e (k+1) = ( M 1 N) k+1 e (0), meaning we have to show that ( ) k+1 M 1 k N 0 ρ(m 1 N) < 1. (5) Let us start with. Chose ε > 0 such that ρ(m 1 N) + ε < 1. Then, it can be shown that there exists a norm on C n sucht that the induced operator norm fulfills M 1 N ρ(m 1 N) + ε < 1. This shows that (M 1 N) k+1 (M 1 N) k+1 (ρ(m 1 N) + ε) k+1, where the last expression tends to zero as k. It follows that (M 1 N) k+1 and hence e (k) converge to zero. To show of (2), let λ be an arbitrary eigenvalue of M 1 N with corresponding eigenvector x. Then, λ k is an eigenvalue of (M 1 N) k. It follows λ k x = λ k x = (M 1 N) k x (M 1 N) k x. Since (M 1 N) k 0 we have λ k 0 and hence λ < 1. This is true for every eigenvalue, it follows that the spectral radius is less than 1. 1 Following Numerik linearer Gleichungssysteme by Christian Kanzow, Springer Verlag, 2005 3

Example: 1. Let A be hermitian and positive definite (hpd), M = D L (Gauß-Seidel) ρ(m 1 N) < 1. 2. Let A and D + L + U be hpd ρ(d 1 (L + U)) < 1. Remark: We have in every induced operator norm A < 1 ρ(a) < 1 A = max x =1 Ax. For instance x A = max i a ij x 1 A = max j a ij x 2 A 2 = j i ρ(a A) (= ρ(a) if A is hermitian). 3 Krylov Subspaces Recall the Cayley-Hamilton theorem: There is p Π n so that p(a) = 0 = i α ia i = 0, where Π n denotes the space of polynomials of max degree = n. Make p of minimal degree n α 0 0 (if α 0 was zero, one could factor out one A in contradiction to the minimal degree). We then have n 0 = α i A i A 1 = 1 α i A i 1 : q n α 1(A), i=0 0 i=1 where q n 1 Π n 1. We have thus established that the inverse of A is a polynomial in A. { r (0), Ar (0), A 2 r (0),..., A m 1 r (0)} as the m-th Krylov sub- Idea: Define K m (A, r (0) ) = span space, which is equivalent to n K m (A, r (0) ) = {p m 1 (A)r (0) : p m 1 Π n }. Take r (0) = b Ax (0), A(x x (0) ) = r (0) so x = x (0) + A 1 r (0). It follows that x x 0 + K n (A, r 0 ) for n large enough. 4

Krylov subspace framework The following is a general framework for constructing an iterative method for the solution of Ax = b using Krylov subspaces. for m = 1, 2,... extend a basis of K m (A, r (0) ) to one of K m+1 (A, r (0) ) extract an appropriate iterate x (m+1) from x (0) + K m+1 (A, r (0) ) end for 4 The Lanczos Process The Lanczos process is an implementation of the Krylov subspace framework for hermitian matrices. Let A be hermitian. Our purpose is to construct a nested orthonormal basis v 1,..., v m for K m (A, r (0) ): K j (A, r (0) ) = span{v 1,..., v j }, j = 1, 2,... K 1 (A, r (0) ) : K 2 (A, r (0) ) : v 1 = 1 r (0) r(0) v 1, ṽ 2 = Av 1 Av 1, v 1 v 1 v 2 = 1 ṽ 2 ṽ2. K m (A, r (0) ) : v 1,..., v m 1, m 1 ṽ m = Av m 1 Av m 1, v j v j (6) v m = 1 ṽ m ṽm j=1 As A is hermitian equation (6) can be simplified using Av m 1, v j = v m 1, Av j and Av j K j+1 (A, r (0) ) v m 1 for all j < m 2 to give ṽ m = Av m 1 m 1 j=m 2 Av m 1, v j v j. Additionaly when computing ṽ m one can save computations by using the values v m 1, Av m 2 from the calculation of v m 1. 5

Lanczos Process Pseudocode: Putting everything together we obtain the following pseudocode. given: r (0) compute: β 1 = r (0), v1 = 1 β 1 r (0), v 0 = 0 for m = 2,... ṽ m = Av m 1 α m = ṽ m, v m 1 w m = ṽ m α m v m 1 β m 1 v m 2 β m = w m v m = 1 β m w m end for Summary: We have β m v m = Av m 1 α m v m 1 β m 1 v m 2, m = 2, 3,... Av m = β m+1 v m+1 + α m+1 v m + β m v m 1 = [v m 1 v m v m+1 ] β m α m+1, β m+1 or in full matrix notation: where AV m = V m+1 T m+1,m, V m = [v 1... v m ], [ T m T m+1,m =, 0... 0 β m+1 α 2 β 2 0... 0. β........ 2. T m =. 0........ 0........... βm 0... 0 β m α m+1 The cg-method The method of conjugate gradients by Hestenes and Stiefel, published in 1952, uses the ] 6

Lanczos process to compute approximations for the solution of Ax = b. It is motivated as follows. Chose x m x 0 + K m (A, r (0) ) such that b Ax m K m (A, r (0) ) ( Galerkin Property ). This Galerkin property is the defining property of the cg iterates. Computationally, we can get x m = V m ξ m, ξ m C n, at least in principle, as follows. b A(x 0 + V m ξ m ) K m (A, r (0) ) V m(x 0 + V m ξ m ) = 0 V m(r (0) AV m ξ m ) = 0 1 r (0) 0. T mξ m = 0, (7) 0 using 1 V mr (0) = r (0) 0. 0 V mav m = V mv m+1 T m+1,m 0 [ ] T = I m m. 0... 0 0 = T m. In each step of the cg method we have to solve the system (7) which is a tridiagonal system of size m n, hence much easier to solve than the original system. Some technical transformations show that it is possible to cheaply update the iterates x m+1 from x m using a search vector p m and that the residual r m+1 = b Ax m+1 is a multiple of the Lanczos vector v m+1. The full method can be written as follows (See Golub and Van Loan, Matrix Computations, 3rd. Ed.): 7

k = 0 r 0 = b Ax 0 while r k 0 k = k + 1 if k = 1 p 1 = r 0 else β k = r T k 1 r k 1/r T k 2 r k 2 p k = r k 1 + β k p k 1 end α k = r k 1 r k 1 /p T k Ap k x k = x k 1 + α k p k r k = r k 1 α k Ap k end x = x k Remark: b Ax m K m (A, r (0) ) is equivalent to x x m, A(x x m ) = min{ x x, A(x x) : x x 0 + K m (A, r (0) )}. The number x x m, A(x x m ) is called the A-Norm of Error. In fact, the map x x, Ax is a norm, as one can easily check. In other words, cg is optimal in the sense that it gets as its mth iterate the vector x m x 0 + K m (A, r (0) ) for what the A-norm of the error is minimal among all vectors from x 0 + K m (A, r (0) ). The convergence behaviour of cg depends on the number κ 2 (A) := A 2 A 1 2. Fast convergence can be expected if κ 2 (A) 1. 8