Introduction. Math 1080: Numerical Linear Algebra Chapter 4, Iterative Methods. Example: First Order Richardson. Strategy

Introduction Math 1080: Numerical Linear Algebra Chapter 4, Iterative Methods M. M. Sussman sussmanm@math.pitt.edu Office Hours: MW 1:45PM-2:45PM, Thack 622 Solve system Ax = b by repeatedly computing residuals r = b Ax Manageable computer memory requirements Store only nonzero entries A Can be implemented on large-scale parallel computers Many different methods available Methods of choice for many problems March 2015 1 / 70 3 / 70 Strategy Computing the residual r = b A x for x an approximate solution is cheap in both operations and storage. Algorithm (Basic Iterative Method) Given an approximate solution x and a maximum number of steps itmax: Compute residual: r = b A x for i = 1:itmax Use r to improve x Compute residual using improved x: r = b A x Use residual and update to estimate accuracy if accuracy is acceptable, exit with converged solution Signal failure if accuracy is not acceptable. Example: First Order Richardson Pick the number ρ > 0 Rewrite Ax = b as ρ(x x) = b Ax, Write ρ(x n+1 x n ) = b Ax n Solve for x n+1 x n+1 = [I 1 ρ A]x n + 1 ρ b. Guess x 0 and iterate using x n+1 = [I 1 ρ A]x n + 1 ρ b. 4 / 70 5 / 70

Algorithm (FOR = First Order Richardson) Given ρ > 0, target accuracy tol, maximum number of steps itmax and initial guess x 0 : Compute residual: r 0 = b Ax 0 for n = 1:itmax Compute update n = (1/ρ)r n Compute next approximation x n+1 = x n + n Compute residual r n+1 = b Ax n+1 Estimate residual accuracy criterion r n+1 / b <tol Estimate update accuracy criterion n / x n+1 <tol if both residual and update are acceptable exit with converged solution Signal failure if accuracy is not acceptable. Recall the MPP in 3D u ijk = u(x i, y j, z k ) and f ijk = f (x i, y j, z k ) For a typical point (x i, y j, z k ) Ω, the equation becomes 6u ijk u i+1jk u i 1jk u ij+1k u ij 1k u ijk+1 u ijk 1 = h 2 f ijk, If (x i, y j, z k ) lies on the boundary, u ijk = 0 Take i, j, k = 1,..., N + 1 Boundary values at i = 1 or i = N + 1, j = 1 or j = N + 1, k = 1 or k = N + 1 Choose ρ = 6 and FOR turns into the Jacobi method. 6 / 70 7 / 70 Jacobi iteration in 3d Given: 1. A tolerance tol, 2. A maximum number of iterations itmax 3. Arrays uold, unew and f, each of size (N+1,N+1,N+1) 4. Boundary values of uold and unew filled with zeros for homogeneous Dirichlet b.c. h=1/n for it=1:itmax % initialize solution, delta, residual and rhs norms delta=0 unorm=0 bnorm=0 Jacobi iteration cont d for i=2:n for j=2:n for k=2:n % compute increment au=-( uold(i+1,j,k) + uold(i,j+1,k)... + uold(i,j,k+1) + uold(i-1,j,k)... + uold(i,j-1,k) + uold(i,j,k-1) ) unew(i,j,k)=(h^2*f(i,j,k) - au)/6 % add next term to norms delta=delta + (unew(i,j,k) - uold(i,j,k))^2 unorm=unorm + (unew(i,j,k))^2 bnorm=bnorm + (h^2*f(i,j,k))^2 uold=unew % set uold for next iteration 8 / 70 9 / 70

Jacobi iteration norms and residual Jacobi iteration convergence test Remarks % complete norm calculation delta=sqrt(delta) unorm=sqrt(unorm) bnorm=sqrt(bnorm) % compute residual resid=0; for i=2:n for j=2:n for k=2:n au= - ( unew(i+1,j,k) + unew(i,j+1,k)... + unew(i,j,k+1) + unew(i-1,j,k)... + unew(i,j-1,k) + unew(i,j,k-1) ); resid=resid + (h^2*f(i,j,k) - au... - 6*unew(i,j,k))^2; resid=sqrt(resid) 10 / 70 % test for convergence if resid <= tol*bnorm & delta <= tol*unorm solution converged return error( convergence failed ) Example: FOR for 1d MPP 11 / 70 In the file jacobi3d.m among the supplemental files on my web site If this algorithm were written to be executed on a computer, the calculation of bnorm would be done once, before the loop began. Two computations of au are really the same and should be combined. Requires 3 (N + 1) (N + 1) (N + 1) arrays Matrix is not stored Not fast enough for prime time u = f (x) = x, 0 < x < 1 u(0) = 0 = u(1) 5 with h = 1 5 leads to the 4 4 tridiagonal linear system whose true solution is 2u 1 u 2 = 1 u 1 +2u 2 u 3 = 2 u 2 +2u 3 u 4 = 3 u 3 +2u 4 = 4. u 1 = 4, u 2 = 7, u 3 = 8, u 4 = 6. 12 / 70 13 / 70

FOR/Jacobi iteration Taking ρ = 2 u1 NEW = 1 2 + 1 2 uold 2 u2 NEW = 2 2 + uold 1 + u3 OLD 2 u3 NEW = 3 2 + uold 2 + u4 OLD 2 4 = 4 2 + uold 3 2. u NEW Starting from u OLD = u 0 = (0, 0, 0, 0) t u 1 = 1/2 1 3/2 2, u10 = 3.44 6.07 7.09 5.42, u20 = 3.93 6.89 7.89 5.93 35 steps for 3-digit accuracy for a 4 4 system Homework, u35 = 4.00 7.00 8.00 6.00. 14 / 70 Matlab example % Jacobi iteration for -u =x/5, u(0)=u(1)=0, h=1/5 utrue=[4; 7; 8; 6]; unew =[0; 0; 0; 0]; for k=1:35 uold = unew; % column vector % equations from slides unew(1) = 0.5 +.5 * uold(2); unew(2) = 1.0 +.5 * (uold(1) + uold(3)); unew(3) = 1.5 +.5 * (uold(2) + uold(4)); unew(4) = 2.0 +.5 * uold(3); [unew, utrue] % print iterates and true solution n=input( ); % wait for keypress [unew, utrue] Preconditioning FOR 15 / 70 Text, Exercises 197, 199, 200. Do 197 and 199 by hand do not try to write a program to do them. In Exercise 200, you should modify jacobi3d.m to handle the 2D case. Do not forget to change ρ to 4 from 6. Given: 1. A N N matrix A 2. A N N matrix M called a Preconditioner 3. A right side vector b 4. An initial guess x 0 n=0 while convergence is not satisfied Obtain x n+1 as the solution of M(x n+1 x n ) = b Ax n n=n+1 16 / 70 18 / 70

Preconditioner Definition: stationary iterative method Pure FOR has M = (1/ρ)I Choose M = A, get solution in 1 iteration Tradeoff between complexity of M and number of iterations An iterative method that can be expressed in the form x n+1 = Bx n + c where B and c do not dep on n is called Stationary. Example Both FOR and preconditioned FOR are stationary. 19 / 70 21 / 70 Fixed points Residual-Update Form: For a function Φ, a fixed point of Φ(x) is any x satisfying x = Φ(x) A fixed point iteration or Picard iteration is an algorithm approximating x by 1. Guess x 0 2. Repeat x n+1 = Φ(x n ) until convergence. Given x n Repeat until convergence: 1. Compute residual: r n = b Ax n 2. Compute update: n = M 1 r n 3. Perform update: x n+1 = x n + n This is often the way the methods are programmed. 22 / 70 24 / 70

Fixed Point Iteration Form Regular Splitting Form: Stationary iterative method Rewrite as fixed point iteration: Define T = I M 1 A T is the iteration operator. x n+1 = M 1 b + Tx n =: Φ(x n ). This is the form used to analyze convergence and rates of convergence. Rewrite A = M N so N = M A Write Ax = b as Mx = b + Nx Regular Splitting form: Mx n+1 = b + Nx n. Example: FOR M = ρi and N = ρi A so the regular splitting form becomes (ρi)x n+1 = b + (ρi A)x n. 25 / 70 26 / 70 Jacobi method Theorem 205 Suppose A = D L U 1. D is the diagonal part of A 2. L is the lower triangular part of A 3. U is the upper triangular part of A Jacobi method is given as Dx n+1 = b + (L + U)x n. When D = ρi, The Jacobi method for MPP agrees with FOR. Theorem Consider first order Richardson: T = T FOR = I ρ 1 A Error: e n = x x n Residual: r n = b Ax n Update: n = x n+1 x n all satisfy the same iteration e n+1 = Te n r n+1 = Tr n n+1 = T n 27 / 70 28 / 70

Proof e n+1 = Te n Proof n+1 = T n Since x = ρ 1 b + Tx and x n+1 = ρ 1 b + Tx n, subtraction gives (x x n+1 ) = T (x x n ) and e n+1 = Te n. Start with Subtraction gives So that x n+1 = ρ 1 b + Tx n x n = ρ 1 b + Tx n 1. (x n+1 x n ) = T (x n x n 1 ) n+1 = T n. 29 / 70 30 / 70 Proof r n+1 = Tr n What the theorem means Since ρx n+1 = ρx n + b Ax n = ρx n + r n multiply by A and add ρb: ρ ( b Ax n+1) = ρ (b Ax n ) Ar n ρr n+1 = ρr n Ar n r n+1 = (I ρ 1 A)r n = Tr n. For methods more complicated than FOR, this part of the proof gives r n+1 = ATA 1 r n and the typical result is n, r n and e n 0 r n and e n can be of widely different sizes at the same rate. If the residuals improve by k significant digits over initial, the errors also improve similarly over the initial error Observable residual behavior indicates error behavior e n+1 = Te n, n+1 = T n, and r n+1 = ATA 1 r n. 31 / 70 32 / 70

Three stopping criteria Another thing to monitor 1. Too Many Iterations: If n itmax stop, and signal failure. 2. Small Residual: Given tol1, check r n b tol1. 3. Small Update: Given tol2, check Monitor α n := r n+1 r n or n+1 n. α < 1 or > 1 suggests convergence or divergence n x n tol2. 4. Stop and signal success if 2 and 3 are satisfied. 33 / 70 34 / 70 If α is roughly constant Suppose Clearly And, in the limit, so that and hence x N x α n = n+1 n α < 1 x N = x 0 + x = x 0 + x N x = n=n+1 N n=1 n=1 n=n+1 n n n n N+1 1 α The good and the bad Iterative method require minimal storage Need stopping criteria Usually need good preconditioner Need to choose fastest of many available methods 35 / 70 36 / 70

FOR convergence in 1D Investigating convergence Given a, b, for what values of ρ will FOR converge? x n+1 = x n + 1 ρ (b ax n ) Solution x = b/a, and b = ax Subtract x from both sides e n+1 = e n (1/ρ)(ax ax n ) = (1 (a/ρ))e n = te n Hence e n = t n x 0 Convergence when t < 1 If a > 0, 0 < a/2 < ρ If a < 0, ρ < a/2 < 0 Fastest convergence if t = 0, ρ = a Definition 209 Spectral radius of matrix T, spr(t), is the size of the largest eigenvalue spr(t ) = max{ λ : λ = λ(t )}. Theorem 207 The stationary iteration e n+1 = Te n converges for all initial guesses if and only if there is some matrix norm with T < 1. Theorem 208 The stationary iteration e n+1 = Te n converges for all initial guesses if and only if spr(t ) < 1. 38 / 70 39 / 70 Proofs What about nonsymmetric? If T = ρ < 1, then e n T n e 0 0 If any λ 1, set e 0 to be the associated eigenvector, and convergence fails. If T is symmetric, then there is a basis {v i } consisting of eigenvectors, so that e 0 = i a iv i Hence e n = i a iλ n i v i If all λ < 1, e n 0 % nonsymmetricexample.m A=[.99 1 0 0 0.99 1 0 0 0.99 1 0 0 0.99]; A^2 A^3 for k=1:1000 n(k)=norm(a^k,inf); plot(n) 40 / 70 41 / 70

Similar matrices have same eigenvalues Functions of matrices Definition 212 Matrices B and PBP 1 are similar. Lemma 213 Similar matrices have the same eigenvalues Proof Bφ = λφ PB(P 1 P)φ = λpφ (PBP 1 )ψ = λψ where ψ = Pφ If a function has a series representation, can plug in a square matrix! f (x) = 1 f (A) = A 1 x f (x) = 1 f (A) = I + A 1 1 + x f (x) = e x = 1 + x + x 2 f (x) = x 2 1 f (A) = A 2 I. 2! +... f (A) = ea = n=0 A n n!, eigenvalues of f (A) are f (eigenvalues of A) 42 / 70 Homework 43 / 70 Spectral mapping theorem Let f : C C be an analytic function. If (λ, φ) is an eigenpair for A then (f (λ), φ) is an eigenpair for f (A). Text, 215, 217. ( SMT means Spectral Mapping Theorem ) 44 / 70 45 / 70

Convergence of FOR Theorem 218 Suppose A is SPD. Then FOR converges for any initial guess x 0 provided ρ > λ max (A)/2. Proof Rewrite Ax = b as ρ(x x) = b Ax. Hence the error equation ρ(e n+1 e n ) = Ae n, e n = x x n, Convergence of FOR cont d We know e n 0 provided λ(t ) < 1, or Since λ(a) and ρ are positive. 1 < 1 λ(a)/ρ < +1. 1 < 1 λ max ρ or e n+1 = Te n, T = (I ρ 1 A). so that ρ < ρ λ max Need spr(t ) < 1 f (x) = 1 x/ρ, T = f (A), so λ(t ) = 1 λ(a)/ρ Since A is SPD, λ(a) is real and positive: or ρ > λ max(a). 2 0 < λ min (A) λ(a) λ max (A) <. 47 / 70 48 / 70 Optimizing ρ spr(t ) = 1 λ/ρ for 0 < λ min λ(a) λ max The smaller T 2 = spr(t ) the faster e n 0. If A = λi, spr(t ) = 1 λ/ρ Can minimize with ρ = λ spr(t ) If 0 < λ min λ(a) λ max ρ 50 / 70 51 / 70

Finding optimal ρ Practical choices of ρ For a given ρ T 2 = spr(t ) = max λ(t ) = max 1 λ(a)/ρ λ(a) Since λ min λ(a) λ max, T 2 = max{ 1 λ min /ρ, 1 λ max /ρ }. Already know T 2 < 1 for ρ > λ max /2 Optimal value ρ is intersection (1 λ min /ρ) = 1 λ max /ρ Easy to estimate λ max as A for any norm Harder to estimate λ min Recall condition number= κ = λ max /λ min ρ = λ max, T 2 = 1 too small! If ρ = λ max (A), T 2 = 1 1/κ If ρ = (λ max + λ min )/2, T 2 = 1 2/(κ + 1). Because of slopes, better slightly large than slightly small Possible to adaptively estimate optimal ρ. 2ρ = λ max + λ min ρ = λ max 2 + λ min 2 52 / 70 53 / 70 How many iterations? FOR error reduction summary Since e n = Te n 1, e n = T n e 0 If want to reduce initial error by a factor 0.1, e n e 0 T n 0.1 Taking logs and solving: n ln(10)/ ln( T ) Let T = 1 α, for small α ln(1 α) α + O(α 2 ), 1. ρ = λ max (A), FOR requires n ln(10) κ(a) iterations per significant digit of accuracy. 2. ρ = (λ max (A) + λ min (A))/2, FOR requires n ln(10) κ(a)/2 iterations. 3. For MPP, κ(a) = O(h 2 ), FOR requires n = O(h 2 ) iterations per significant digit of accuracy. 4. Much too slow! α = 1/κ for ρ = λ max n ln(10)κ 54 / 70 55 / 70

Homework Gauß-Seidel in 1D MPP Jacobi (FOR with ρ = 2) Gauß-Seidel Exercise E: Matlab investigations of convergence of FOR. h=1/n for it = 1:itmax for i = 2:N unew(i)=h^2*f(i) +... (uold(i-1)+uold(i+1))/2 if convergence satisfied exit uold=unew h=1/n for it = 1:itmax for i = 2:N u(i)=h^2*f(i) +... (u(i-1) + u(i+1))/2 if convergence satisfied exit Algebraic description 56 / 70 Convergence rate 58 / 70 Write A = D + L + U Jacobi: Dx n+1 = b (L + U)x n equivalently D(x n+1 x n ) = b Ax n Gauß-Seidel: (D + U)x n+1 = b Lx n equivalently (D + U)(x n+1 x n ) = b Ax n General form: M(x n+1 x n ) = b Ax n Gauß-Seidel usually takes half as many iterations as Jacobi For the MPP, half of O(h 2 ) is still O(h 2 ) Would like to reduce the exponent! Ingenious choice of M can help, but is often problem-depent. 59 / 70 60 / 70

Relaxation The good and the bad Given ω > 0, a maximum number of iterations itmax and x 0 : for n=1:itmax Compute x n+1 temp by some iterative method Relax x n+1 = ωx n+1 n temp + (1 ω)x if x n+1 is acceptable, exit Good Easy to put into any iterative method program. Good The right choice of ω can reduce the number of iterations Bad The wrong choice of ω can increase the number of iterations When programming, x n+1 temp and x n+1 can take the same storage. Underrelaxation example 62 / 70 Overrelaxation example 63 / 70 Consider e n = ( 0.9)e n 1 1.0000-0.9000 0.8100-0.7290 0.6561-0.5905 0.5314-0.4783 0.4305-0.3874 0.3487-0.3138 0.2824-0.2542 0.2288-0.2059 0.1853-0.1668 0.1501-0.1351 0.1216-0.1094 0.0985-0.0886 0.0798-0.0718 0.0646-0.0581 0.0523-0.0471 0.0424-0.0382 0.0343-0.0309 0.0278-0.0250 0.0225-0.0203 0.0182-0.0164 0.0148-0.0133 0.0120-0.0108 0.0097-0.0087 0.0079-0.0071 0.0064-0.0057 0.0052 Underrelaxation with omega = 0.5 1.0000 0.0500 0.0025 0.0001 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 Final number is 1.9531e-12 Consider e n = 0.9 e n 1 1.0000 0.9000 0.8100 0.7290 0.6561 0.5905 0.5314 0.4783 0.4305 0.3874 0.3487 0.3138 0.2824 0.2542 0.2288 0.2059 0.1853 0.1668 0.1501 0.1351 0.1216 0.1094 0.0985 0.0886 0.0798 0.0718 0.0646 0.0581 0.0523 0.0471 0.0424 0.0382 0.0343 0.0309 0.0278 0.0250 0.0225 0.0203 0.0182 0.0164 0.0148 0.0133 0.0120 0.0108 0.0097 0.0087 0.0079 0.0071 0.0064 0.0057 Overrelaxation with ω = 2 1.0000 0.8000 0.6400 0.5120 0.4096 0.3277 0.2621 0.2097 0.1678 0.1342 0.1074 0.0859 0.0687 0.0550 0.0440 0.0352 0.0281 0.0225 0.0180 0.0144 0.0115 0.0092 0.0074 0.0059 0.0047 0.0038 0.0030 0.0024 0.0019 0.0015 0.0012 0.0010 0.0008 0.0006 0.0005 0.0004 0.0003 0.0003 0.0002 0.0002 0.0001 0.0001 0.0001 0.0001 0.0001 0.0000 0.0000 0.0000 0.0000 0.0000 final number = 1.7841e-05 64 / 70 65 / 70

Homework SOR for 2D MPP Given an array u of size N+1 by N+1 with boundary values filled with zeros, a maximum number of iterations itmax, a tolerance tol, and an estimate for the optimal omega =omega: Text, Exercise 225, 226. h=1/n for it=1:itmax for i=2:n for j=2:n uold=u(i,j) u(i,j)=h^2*f(i,j)... + (u(i+1,j)+u(i-1,j)+u(i,j+1)+u(i,j-1))/4 u(i,j)=omega*u(i,j)+(1-omega)*uold(i,j) if convergence is satisfied, exit Theorem 230 66 / 70 How to choose ω optimal? 67 / 70 Theorem (Convergence of SOR) Let A be SPD and let T Jacobi = D 1 (L + U) be the iteration matrix for Jacobi (not SOR). If spr(t Jacobi ) < 1, then SOR converges for any ω with 0 < ω < 2 and there is an optimal choice of ω, known as ω optimal, given by 2 ω optimal =. 1 + 1 (spr(t Jacobi )) 2 For ω = ω optimal and T SOR, T GaussSeidel the iteration matrices for SOR and Gauss-Seidel respectively, we have spr(t SOR ) = ω optimal 1 < (spr(t GaussSeidel )) 2 spr(t Jacobi ) < 1. SOR convergence rate is the Gauß-Seidel rate squared! From O(h 2 ) to O(h). Known for MPP with simple geometry Can be estimated dynamically 68 / 70 69 / 70

Homework Text, 231. 70 / 70