Chapter 7 Iterative Techniques in Matrix Algebra

Similar documents
Iterative techniques in matrix algebra

Lecture 18 Classical Iterative Methods

Lecture Note 7: Iterative methods for solving linear systems. Xiaoqun Zhang Shanghai Jiao Tong University

Iterative Methods for Solving A x = b

AMS526: Numerical Analysis I (Numerical Linear Algebra)

Iterative methods for Linear System

AMS526: Numerical Analysis I (Numerical Linear Algebra) Lecture 23: GMRES and Other Krylov Subspace Methods; Preconditioning

9.1 Preconditioned Krylov Subspace Methods

Iterative methods for Linear System of Equations. Joint Advanced Student School (JASS-2009)

Numerical Methods - Numerical Linear Algebra

Conjugate Gradient (CG) Method

Lab 1: Iterative Methods for Solving Linear Systems

The Conjugate Gradient Method

Algebra C Numerical Linear Algebra Sample Exam Problems

OUTLINE ffl CFD: elliptic pde's! Ax = b ffl Basic iterative methods ffl Krylov subspace methods ffl Preconditioning techniques: Iterative methods ILU

Parallel Numerics, WT 2016/ Iterative Methods for Sparse Linear Systems of Equations. page 1 of 1

Bindel, Fall 2016 Matrix Computations (CS 6210) Notes for

Preface to the Second Edition. Preface to the First Edition

The amount of work to construct each new guess from the previous one should be a small multiple of the number of nonzeros in A.

Computational Methods. Systems of Linear Equations

AMS526: Numerical Analysis I (Numerical Linear Algebra)

Math Introduction to Numerical Analysis - Class Notes. Fernando Guevara Vasquez. Version Date: January 17, 2012.

MAT 610: Numerical Linear Algebra. James V. Lambers

Lecture 17 Methods for System of Linear Equations: Part 2. Songting Luo. Department of Mathematics Iowa State University

Numerical Methods in Matrix Computations

Chapter 7. Iterative methods for large sparse linear systems. 7.1 Sparse matrix algebra. Large sparse matrices

Computational Economics and Finance

4.6 Iterative Solvers for Linear Systems

Iterative Methods. Splitting Methods

Numerical Methods I Non-Square and Sparse Linear Systems

6.4 Krylov Subspaces and Conjugate Gradients

FEM and sparse linear system solving

Here is an example of a block diagonal matrix with Jordan Blocks on the diagonal: J

Iterative Methods for Sparse Linear Systems

7.3 The Jacobi and Gauss-Siedel Iterative Techniques. Problem: To solve Ax = b for A R n n. Methodology: Iteratively approximate solution x. No GEPP.

JACOBI S ITERATION METHOD

Sparse Linear Systems. Iterative Methods for Sparse Linear Systems. Motivation for Studying Sparse Linear Systems. Partial Differential Equations

Preliminary/Qualifying Exam in Numerical Analysis (Math 502a) Spring 2012

9. Iterative Methods for Large Linear Systems

SOLVING SPARSE LINEAR SYSTEMS OF EQUATIONS. Chao Yang Computational Research Division Lawrence Berkeley National Laboratory Berkeley, CA, USA

CAAM 454/554: Stationary Iterative Methods

AMS526: Numerical Analysis I (Numerical Linear Algebra for Computational and Data Sciences)

Linear Solvers. Andrew Hazel

Background. Background. C. T. Kelley NC State University tim C. T. Kelley Background NCSU, Spring / 58

Conjugate gradient method. Descent method. Conjugate search direction. Conjugate Gradient Algorithm (294)

Tsung-Ming Huang. Matrix Computation, 2016, NTNU

Notes on Some Methods for Solving Linear Systems

Jordan Journal of Mathematics and Statistics (JJMS) 5(3), 2012, pp A NEW ITERATIVE METHOD FOR SOLVING LINEAR SYSTEMS OF EQUATIONS

PETROV-GALERKIN METHODS

6. Iterative Methods for Linear Systems. The stepwise approach to the solution...

CLASSICAL ITERATIVE METHODS

FEM and Sparse Linear System Solving

Algebraic Multigrid as Solvers and as Preconditioner

ITERATIVE METHODS FOR SPARSE LINEAR SYSTEMS

1. Fast Iterative Solvers of SLE

COURSE Iterative methods for solving linear systems

Topics. The CG Algorithm Algorithmic Options CG s Two Main Convergence Theorems

Conjugate Gradient Method

Numerical Linear Algebra And Its Applications

Iterative Solution methods

Contents. Preface... xi. Introduction...

Applied Mathematics 205. Unit V: Eigenvalue Problems. Lecturer: Dr. David Knezevic

The Conjugate Gradient Method

APPLIED NUMERICAL LINEAR ALGEBRA

Stabilization and Acceleration of Algebraic Multigrid Method

Iterative Methods and Multigrid

Introduction to Scientific Computing

Notes on PCG for Sparse Linear Systems

EXAMPLES OF CLASSICAL ITERATIVE METHODS

LECTURE NOTES ELEMENTARY NUMERICAL METHODS. Eusebius Doedel

Lecture # 20 The Preconditioned Conjugate Gradient Method

Goal: to construct some general-purpose algorithms for solving systems of linear Equations

Some definitions. Math 1080: Numerical Linear Algebra Chapter 5, Solving Ax = b by Optimization. A-inner product. Important facts

Master Thesis Literature Study Presentation

Course Notes: Week 1

CME342 Parallel Methods in Numerical Analysis. Matrix Computation: Iterative Methods II. Sparse Matrix-vector Multiplication.

KRYLOV SUBSPACE ITERATION

Vector Norms. Chapter 7 Iterative Techniques in Matrix Algebra. Cauchy-Bunyakovsky-Schwarz Inequality for Sums. Distances. Convergence.

Applied Linear Algebra

M.A. Botchev. September 5, 2014

AMS526: Numerical Analysis I (Numerical Linear Algebra)

Solving Symmetric Indefinite Systems with Symmetric Positive Definite Preconditioners

Lecture 11: CMSC 878R/AMSC698R. Iterative Methods An introduction. Outline. Inverse, LU decomposition, Cholesky, SVD, etc.

DEN: Linear algebra numerical view (GEM: Gauss elimination method for reducing a full rank matrix to upper-triangular

Linear Algebra Massoud Malek

Lecture 11. Fast Linear Solvers: Iterative Methods. J. Chaudhry. Department of Mathematics and Statistics University of New Mexico

ITERATIVE METHODS BASED ON KRYLOV SUBSPACES

Theory of Iterative Methods

Computational Linear Algebra

Multigrid absolute value preconditioning

Math 1080: Numerical Linear Algebra Chapter 4, Iterative Methods

Numerical Solution Techniques in Mechanical and Aerospace Engineering

AN ELEMENTARY PROOF OF THE SPECTRAL RADIUS FORMULA FOR MATRICES

CSC 576: Linear System

Iterative Methods for Ax=b

Algebraic Multigrid Preconditioners for Computing Stationary Distributions of Markov Processes

Introduction. Math 1080: Numerical Linear Algebra Chapter 4, Iterative Methods. Example: First Order Richardson. Strategy

Computational Linear Algebra

Math 471 (Numerical methods) Chapter 3 (second half). System of equations

Math 577 Assignment 7

Transcription:

Chapter 7 Iterative Techniques in Matrix Algebra Per-Olof Persson persson@berkeley.edu Department of Mathematics University of California, Berkeley Math 128B Numerical Analysis

Vector Norms Definition A vector norm on R n is a function,, from R n into R with the properties: (i) x 0 for all x R n (ii) x = 0 if and only if x = 0 (iii) αx = α x for all α R and x R n (iv) x + y x + y for all x, y R n Definition The Euclidean norm l 2 and the infinity norm l for the vector x = (x 1, x 2,..., x n ) t are defined by { n x 2 = i=1 x 2 i } 1/2 and x = max 1 i n x i

Cauchy-Bunyakovsky-Schwarz Inequality for Sums Theorem For each x = (x 1, x 2,..., x n ) t and y = (y 1, y 2,..., y n ) t in R n, x t y = { n n x i y i x 2 i i=1 i=1 i=1 } 1/2 { n } 1/2 yi 2 = x 2 y 2

Distances Definition The distance between two vectors x = (x 1,..., x n ) t and y = (y 1,..., y n ) t is the norm of the difference of the vectors. The l 2 and l distances are { n } 1/2 x y 2 = (x i y i ) 2 i=1 x y = max 1 i n x i y i

Convergence Definition A sequence {x (k) k=1 of vectors in Rn is said to converge to x with respect to the norm if, given any ε > 0, there exists an integer N(ε) such that x (k) x < ε, for all k N(ε) Theorem The sequence of vectors {x (k) } converges to x in R n with respect to if and only if lim k x (k) i = x i. Theorem For each x R n, x x 2 n x

Matrix Norms Definition A matrix norm on n n matrices is a real-valued function satisfying (i) A 0 (ii) A = 0, if and only if A = 0 (iii) αa = α A (iv) A + B A + B (v) AB A B

Natural Matrix Norms Theorem If is a vector norm, the natural (or induced) matrix norm is given by Corollary A = max x =1 Ax For any vector z 0, matrix A, and natural norm, Theorem Az A z If A = (a ij ) is an n n matrix, then A = max 1 i n j=1 n a ij

Eigenvalues and Eigenvectors Definition The characteristic polynomial of a square matrix A is p(λ) = det(a λi) Definition The zeros λ of the characteristic polynomial are eigenvalues of A, x 0 satisfying (A λi)x = 0 is a corresponding eigenvector. Definition The spectral radius ρ(a) of a matrix A is ρ(a) = max λ, Theorem If A is an n n matrix, then (i) A 2 = [ρ(a t A)] 1/2 for eigenvalues λ of A (ii) ρ(a) A, for any natural norm

Convergent Matrices Definition An n n matrix A is convergent if lim k (Ak ) ij = 0, for each i = 1, 2,..., n and j = 1, 2,..., n Theorem The following statements are equivalent. (i) A is a convergent matrix (ii) lim n A n = 0, for some natural norm (iii) lim n A n = 0, for all natural norms (iv) ρ(a) < 1 (v) lim n A n x = 0, for every x

Iterative Methods for Linear Systems Direct methods for solving Ax = b, e.g. Gaussian elimination, compute an exact solution after a finite number of steps (in exact arithmetic) Iterative algorithms produce a sequence of approximations x (1), x (2),... which hopefully converges to the solution, and may require less memory than direct methods may be faster than direct methods may handle special structures (such as sparsity) in a simpler way Residual r = b Ax 10 0 10 5 10 10 10 15 Direct Iterative 0 5 10 15 20 25 30 Iteration

Two Classes of Iterative Methods Stationary methods (or classical iterative methods) finds a splitting A = M K and iterates x (k) = M 1 (Kx (k 1) + b) = T x (k 1) + c Jacobi, Gauss-Seidel, Successive Overrelaxation (SOR) Krylov subspace methods use only multiplication by A (and possibly by A T ) and find solutions in the Krylov subspace {b, Ab, A 2 b,..., A k 1 b} Conjugate Gradient (CG), Generalized Minimal Residual (GMRES), BiConjugate Gradient (BiCG), etc

Jacobi s Method An iterative technique to solve Ax = b starts with an initial approximation x (0) and generates a sequence of vectors {x (k) } k=0 that converges to x. Jacobi s Method Solve for x i in the the ith equation of Ax = b: x i = n j=1 j i ( a ) ijx j + b i, a ii a ii for i = 1, 2,..., n This leads to the iteration x (k) i = 1 n ( a ii j=1 j i a ij x (k 1) j ) + b i, for i = 1, 2,..., n

Matrix form of Jacobi s Method Convert Ax = b into an equivalent system x = T x + c, select initial vector x (0) and iterate x (k) = T x (k 1) + c For Jacobi s method, split A into diagonal and off-diagonal parts: a 11 a 12 a 1n a 11 0 0 0 0 0 a 12 a 1n a 21 a 22 a 2n..... = 0 a 22...... a 21......................... 0.......... an 1,n a n1 a n2 a nn 0 0 a nn a n1 a n,n 1 0 0 0 }{{}}{{}}{{}}{{} A D L U This transforms Ax = (D L U)x = b into Dx = (L + U)x + b, and if D 1 exists, this leads to the Jacobi iteration: x (k) = D 1 (L + U)x (k 1) + D 1 b = T j x (k 1) + c j where T j = D 1 (L + U) and c j = D 1 b

The Gauss-Seidel Method The Gauss-Seidel Method Improve Jacobi s method by, for i > 1, using the already updated components x (k) 1,..., x(k) i 1 when computing x(k) i : x (k) i = 1 a ii i 1 j=1 (a ij x (k) j ) n (a ij x (k 1) ) + b i j=i+1 In matrix form, the method can be written (D L)x (k) = Ux (k 1) + b and if (D L) 1 exists, this leads to the Gauss-Seidel iteration x (k) = (D L) 1 Ux (k 1) + (D L) 1 b = T g x (k 1) + c g where T g = (D L) 1 U and c g = (D L) 1 b j

General Iteration Methods Lemma If the spectral radius satisfies ρ(t ) < 1, then (I T ) 1 exists, and Theorem (I T ) 1 = I + T + T 2 + = For any x (0) R n, the sequence x (k) = T x (k 1) + c j=0 converges to the unique solution of x = T x + c if and only if ρ(t ) < 1. T j

General Iteration Methods Corollary If T < 1 for any natural matrix norm, then x (k) = T x (k 1) + c converges for any x (0) R n to a vector x R n s.t. x = T x + c. The following error estimates hold: 1 x x (k) T k x (0) x 2 x x (k) T k 1 T x(1) x (0) Theorem A strictly diagonally dominant = Jacobi and Gauss-Seidel converges for any x (0). Theorem (Stein-Rosenberg) If a ii > 0 for all i and a ij < 0 for i j, then one and only one of the following holds: (i) 0 ρ(t g ) < ρ(t j ) < 1 (ii) 1 < ρ(t j ) < ρ(t g ) (iii) ρ(t j ) = ρ(t g ) = 0 (iv) ρ(t j ) = ρ(t g ) = 1

The Residual Vector Definition The residual vector for x R n with respect to the linear system Ax = b is r = b A x. Consider the approximate solution vector in Gauss-Seidel: x (k) i with residual vector = (x (k) 1, x(k) 2,..., x(k) r (k) i The Gauss-Seidel method: x (k) i = 1 i 1 b i a ii can then be written as i 1, x(k 1) i = (r (k) 1i, r(k) 2i,..., r(k) ni )t x (k) i j=1 a ij x (k) j = x (k 1) i n j=i+1 + r(k) ii a ii,..., x (k 1) n ) t a ij x (k 1) j

Successive Over-Relaxation The relaxation methods uses an iteration of the form x (k) i = x (k 1) i + ω r(k) ii a ii for some positive ω. With ω > 1, they can accelerate the convergence of the Gauss-Seidel method, and are called successive over-relaxation (SOR) methods. Write the SOR method as x (k) i = (1 ω)x (k 1) i + ω i 1 b i a ij x (k) j a ii j=1 which can be written in the matrix form x (k) = T ω x (k 1) + c ω where T ω = (D ωl) 1 [(1 ω)d + ωu] and c ω = ω(d ωl) 1 b. n j=i+1 a ij x (k 1) j

Convergence of the SOR Method Theorem (Kahan) If a ii 0 for all i, then ρ(t ω ) ω 1 and the SOR method can converge only if 0 < ω < 2. Theorem (Ostrowski-Reich) If A is PD and 0 < ω < 2, then SOR converges for any x (0). Theorem If A is PD and tridiagonal, then ρ(t g ) = [ρ(t j )] 2 < 1, and the optimal ω for SOR is ω = 2 1 + 1 [ρ(t j )] 2 which gives ρ(t ω ) = ω 1.

Error Bounds Theorem Suppose Ax = b, A is nonsingular, x x, and r = b A x. Then for any natural norm, and if x, b 0, Definition x x r A 1 x x x A A 1 r b The condition number of nonsingular matrix A in the norm is K(A) = A A 1 In terms of K(A), the error bounds can be written: x x K(A) r A, x x K(A) r x b

Iterative Refinement Algorithm: Iterative Refinement Solve Ax (1) = b for k = 1, 2, 3,... r (k) = b Ax (k) Solve Ay (k) = r (k) x (k+1) = x (k) + y (k) residual compute accurately! solve for correction improve solution Allows for errors in the solution of the linear systems, provided the residual r is computed accurately

Errors in both matrix and right-hand side Theorem Suppose A is nonsingular and The solution x to δa < 1 A 1 (A + δa) x = b + δb approximates the solution x of Ax = b with the error estimate ( x x K(A) A δb x A K(A) δa b + δa ) A

Inner products Definition The inner product for n-dimensional vectors x, y is x, y = x t y Theorem For any vectors x, y, z and real number α: (a) x, y = y, x (b) αx, y = x, αy = α x, y (c) x + z, y = x, y + z, y (d) x, x 0 (e) x, x = 0 x = 0

Krylov Subspace Algorithms Create a sequence of Krylov subspaces for Ax = b: K k = {b, Ab,..., A k 1 b} and find approximate solutions x k in K k Only matrix-vector products involved For SPD matrices, the most popular algorithm is the Conjugate Gradients method [Hestenes/Stiefel, 1952] Finds the best solution x k K k in the norm x A = x t Ax Only requires storage of 4 vectors (not all the k vectors in K k ) Remarkably simple and excellent convergence properties Originally invented as a direct algorithm! (converges after n steps in exact arithmetic)

The Conjugate Gradients Method Algorithm: Conjugate Gradients Method x 0 = 0, r 0 = b, p 0 = r 0 for k = 1, 2, 3,... α k = (rk 1 t r k 1)/(p t k 1 Ap k 1) x k = x k 1 + α k p k 1 r k = r k 1 α k Ap k 1 β k = (rk t r k)/(rk 1 t r k 1) p k = r k + β k p k 1 step length approximate solution residual improvement this step search direction Only one matrix-vector product Ap k 1 per iteration Operation count O(n) (excluding the matrix-vector product)

Properties of Conjugate Gradients Vectors Theorem The spaces spanned by the solutions, the search directions, and the residuals are all equal to the Krylov subspaces: K k = span ({x 1, x 2,..., x k }) = span ({p 0, p 1,..., p k 1 }) ({ }) = span ({r 0, r 1,..., r k 1 }) = span b, Ab,..., A k 1 b The residuals are orthogonal: r t k r j = 0 (j < k) The search directions are A-conjugate: p t k Ap j = 0 (j < k)

Optimality of Conjugate Gradients Theorem The errors e k = x x k are minimized in the A-norm Proof. For any other point x = x k x K k the error is e 2 A = (e k + x) t A(e k + x) = e t k Ae k + ( x) t A( x) + 2e t k A( x) But e t k A( x) = rt k ( x) = 0, since r k is orthogonal to K k, so x = 0 minimizes e A Theorem Monotonic: e k A e k 1 A, and e k = 0 in k m steps Proof. Follows from K k K k+1, and that K k R m unless converged

Optimization in CG CG can be interpreted as a minimization algorithm We know it minimizes e A, but this cannot be evaluated CG also minimizes the quadratic function ϕ(x) = 1 2 xt Ax x t b: e k 2 A = e t k Ae k = (x x k ) t A(x x k ) = x t k Ax k 2x t k Ax + x t Ax = x t k Ax k 2x t k + xt b = 2ϕ(x k ) + constant At each step α k is chosen to minimize x k = x k 1 + α k p k 1 The conjugated search directions p k give minimization over all of K k

Optimization by Conjugate Gradients We know that solving Ax = b is equivalent to minimizing the quadratic function ϕ(x) = 1 2 xt Ax x t b The minimization can be done by line searches, where ϕ(x k ) is minimized along a search direction p k Theorem The α k+1 that minimizes ϕ(x k + α k+1 p k ) is α k+1 = pt k r k p t k Ap k with the residual r k = b Ax k The residual is also minus the gradient of ϕ(x k ): ϕ(x k ) = Ax k b = r k

The Method of Steepest Descent Very simple approach: Set search direction p k to the negative gradient r k Corresponds to moving in the direction ϕ(x) changes the most Algorithm: Steepest Descent x 0 = 0, r 0 = b for k = 1, 2, 3,... α k = (rk 1 t r k 1)/(rk 1 t Ar k 1) x k = x k 1 α k r k 1 r k = r k 1 + α k Ar k 1 step length approximate solution residual Poor convergence, tends to move along previous search directions

The Method of Conjugate Directions The optimization can be improved by better search directions Let the search direction be A-conjugate, or p t i Ap k = 0 Then the algorithm will converge in at most n steps, since the initial error can be decomposed along the p s: n 1 e 0 = δ k p k, with δ k = pt k Ae 0 p t k Ap k k=0 But this is exactly the α we choose at step k: α k+1 = pt k r k p t k Ap k = pt k Ae k p t k Ap k = pt k Ae 0 p t k Ap k since the error e k is the initial e 0 plus a combination of p 0,..., p k 1, which are all A-conjugate to p k. Each component δ k is then subtracted out at step k, and the method converges after n steps.

Choosing A-conjugate Search Directions One method to choose p k which is A-conjugate to previous search vectors is by Gram-Schmidt: k 1 p k = p 0 k β kj p j, with β kj = p0 t k Apj p t j Ap j j=0 The initial p 0 k vectors should be linearly independent, for example column k + 1 of identity matrix Drawback: Must store all previous search vectors p k Conjugate Gradients is simply Conjugate Directions with a particular initial vector in Gram-Schmidt: p 0 k = r k This gives orthogonal residuals r t k r j = 0 for j k, and β kj = 0 for k > j + 1

Preconditioners for Linear Systems Main idea: Instead of solving Ax = b solve, using a nonsingular n n preconditioner M, which has the same solution x M 1 Ax = M 1 b Convergence properties based on M 1 A instead of A Trade-off between the cost of applying M 1 and the improvement of the convergence properties. Extreme cases: M = A, perfect conditioning of M 1 A = I, but expensive M 1 M = I, do nothing M 1 = I, but no improvement of M 1 A = A

Preconditioned Conjugate Gradients To keep symmetry, solve (C 1 AC )C x = C 1 b with CC = M Can be written in terms of M 1 only, without reference to C: Algorithm: Preconditioned Conjugate Gradients Method x 0 = 0, r 0 = b, p 0 = M 1 r 0, z 0 = p 0 for k = 1, 2, 3,... α k = (rk 1 T z k 1)/(p T k 1 Ap k 1) x k = x k 1 + α k p k 1 r k = r k 1 α k Ap k 1 z k = M 1 r k β k = (rk T z k)/(rk 1 T z k 1) p k = z k + β k p k 1 step length approximate solution residual preconditioning improvement this step search direction

Commonly Used Preconditioners A preconditioner should approximately solve the problem Ax = b Jacobi preconditioning - M = diag(a), very simple and cheap, might improve certain problems but usually insufficient Block-Jacobi preconditioning - Use block-diagonal instead of diagonal. Another variant is using several diagonals (e.g. tridiagonal) Classical iterative methods - Precondition by applying one step of Jacobi, Gauss-Seidel, SOR, or SSOR Incomplete factorizations - Perform Gaussian elimination but ignore fill, results in approximate factors A LU or A R T R (more later) Coarse-grid approximations - For a PDE discretized on a grid, a preconditioner can be formed by transferring the solution to a coarser grid, solving a smaller problem, then transferring back (multigrid)