Lecture Notes of Matrix Computations

Size: px

Start display at page:

Download "Lecture Notes of Matrix Computations"

Charles Jeremy Gardner
5 years ago
Views:

1 Lecture Notes of Matrix Computations Wen-Wei Lin Department of Mathematics National Tsing Hua University Hsinchu, Taiwan 30043, R.O.C. May 5, 2008

2 ii

3 Contents I On the Numerical Solutions of Linear Systems 1 1 Introduction Mathematical auxiliary, definitions and relations Vectors and matrices Rank and orthogonality Special matrices Eigenvalues and Eigenvectors Norms and eigenvalues The Sensitivity of Linear System Ax = b Backward error and Forward error An SVD Analysis Normwise Forward Error Bound Componentwise Forward Error Bound Derivation of Condition Number of Ax = b Normwise Backward Error Componentwise Backward Error Determinants and Nearness to Singularity Numerical methods for solving linear systems Elementary matrices LR-factorization Gaussian elimination Practical implementation LDR- and LL T -factorizations Error estimation for linear systems Error analysis for Gaussian algorithm Àpriori error estimate for backward error bound of LR-factorization Improving and Estimating Accuracy Special Linear Systems Toeplitz Systems Banded Systems Symmetric Indefinite Systems Orthogonalization and least squares methods QR-factorization (QR-decomposition) Householder transformation

4 iv CONTENTS Gram-Schmidt method Givens method Overdetermined linear Systems - Least Squares Methods Rank Deficiency I : QR with column pivoting Rank Deficiency II : The Singular Value Decomposition The Sensitivity of the Least Squares Problem Condition number of a Rectangular Matrix Iterative Improvement Iterative Methods for Solving Large Linear Systems General procedures for the construction of iterative methods Some theorems and definitions The theorems of Stein-Rosenberg Sufficient conditions for convergence of TSM and SSM Relaxation Methods (Successive Over-Relaxation (SOR) Method ) Determination of the Optimal Parameter ω for 2-consistly Ordered Matrices Practical Determination of Relaxation Parameter ω b Break-off Criterion for SOR Method Application to Finite Difference Methods: Model Problem (Example 4.1.3) Block Iterative Methods The ADI method of Peaceman and Rachford ADI method (alternating-direction implicit iterative method) The algorithm of Buneman for the solution of the discretized Poisson Equation Comparison with Iterative Methods Derivation and Properties of the Conjugate Gradient Method A Variational Problem, Steepest Descent Method (Gradient Method) Conjugate gradient method Practical Implementation Convergence of CG-method CG-method as an iterative method, preconditioning A new point of view of PCG Incomplete Cholesky Decomposition Chebychev Semi-Iteration Acceleration Method Connection with SOR Method Practical Performance GCG-type Methods for Nonsymmetric Linear Systems GCG method(generalized Conjugate Gradient) BCG method (A: unsymmetric) CGS (Conjugate Gradient Squared), A fast Lanczos-type solver for nonsymmetric linear systems The polynomial equivalent method of the CG method Squaring the CG algorithm: CGS Algorithm Bi-CGSTAB: A Fast and Smoothly Converging Variant of Bi-CG for the Solution of Nonsymmetric Linear Systems

5 CONTENTS v 4.13 A Transpose-Free Qusi-minimal Residual Algorithm for Nonsymmetric Linear Systems Quasi-Minimal Residual Approach Derivation of actual implementation of TFQMR TFQMR Algorithm GMRES: Generalized Minimal Residual Algorithm for solving Nonsymmetric Linear Systems FOM algorithm: Full orthogonalization method The generalized minimal residual (GMRES) algorithm Practical Implementation: Consider QR factorization of H k Theoretical Aspect of GMRES II On the Numerical Solutions of Eigenvalue Problems The Unsymmetric Eigenvalue Problem Orthogonal Projections and C-S Decomposition Perturbation Theory Power Iterations Power Method Inverse Power Iteration Connection with Newton-method Orthogonal Iteration QR-algorithm (QR-method, QR-iteration) The Practical QR Algorithm Single-shift QR-iteration Double Shift QR iteration Ordering Eigenvalues in the Real Schur From LR, LRC and QR algorithms for positive definite matrices qd-algorithm (Quotient Difference) The qd-algorithm for positive definite matrix The Symmetric Eigenvalue problem Properties, Decomposition, Perturbation Theory Tridiagonalization and the Symmetric QR-algorithm Once Again:The Singular Value Decomposition Jacobi Methods Some Special Methods Bisection method for tridiagonal symmetric matrices Rayleigh Quotient Iteration Orthogonal Iteration with Ritz Acceleration Generalized Definite Eigenvalue Problem Ax = λbx Generalized definite eigenvalue problem

6 vi CONTENTS 7 Lanczos Methods The Lanczos Algorithm Applications to linear Systems and Least Squares Symmetric Positive Definite System Bidiagonalization and the SVD Unsymmetric Lanczos Method Arnoldi Method Arnoldi decompositions Krylov decompositions Reduction to Arnoldi form The implicitly restarted Arnoldi method Filter polynomials Implicitly restarted Arnoldi Jacobi-Davidson method JOCC(Jacobi Orthogonal Component Correction) Davidson method Jacobi Davidson method Jacobi Davidson method as on accelerated Newton Scheme Jacobi-Davidson with harmonic Ritz values Jacobi-Davidson Type method for Generalized Eigenproblems

7 List of Tables 1.1 Some definitions for matrices Solving the LS problem (m n) Comparison results for Jacobi, Gauss-Seidel, SOR and ADI methods Number of iterations and operations for Jacobi, Gauss-Seidel and SOR methods (. κ 1...) j 4.3 Convergence rate of q k where j : κ+1 q4, q 8 and j : µ j n q 4, q

9 List of Figures 1.1 Relationship between backward and forward errors figure of ρ(l ωb ) Geometrical view of λ (1) i (ω) and λ (2) i (ω) Orthogonal projection

11 0 LIST OF FIGURES

12 Part I On the Numerical Solutions of Linear Systems

14 Chapter 1 Introduction 1.1 Mathematical auxiliary, definitions and relations Vectors and matrices a 11 a 1n A K m n, where K = R or C A = [a ij ] = a m1 a mn Product of matrices (K m n K n p K m p ): C = AB, where c ij = n k=1 a ikb kj, i = 1,, m, j = 1,, p. Transpose (R m n R n m ): C = A T, where c ij = a ji R. Conjugate transpose (C m n C n m ): C = A or C = A H, where c ij = ā ji C. Differentiation (R m n R m n ): Let C(t) = (c ij (t)). Then Ċ(t) = [ċ ij(t)]. If A, B K n n satisfy AB = I, then B is the inverse of A and is denoted by A 1. If A 1 exists, then A is said to be nonsingular; otherwise, A is singular. A is nonsingular if and only if det(a) 0. If A K m n, x K n and y = Ax, then y i = n j=1 a ijx j, i = 1,, m. Outer product of x K m and y K n : x 1 ȳ 1 x 1 ȳ n xy =..... K m n. x m ȳ 1 x m ȳ n Inner product of x and y K n : (x, y) := x T y = (x, y) := x y = n x i y i = y T x R i=1 n x i y i = y x C i=1

15 4 Chapter 1. Introduction Sherman-Morrison Formula: Let A R n n be nonsingular, u, v R n. If v T A 1 u 1, then (A + uv T ) 1 = A 1 (1 + v T A 1 u) 1 A 1 uv T A 1. (1.1.1) Sherman-Morrison-Woodburg Formula: Let A R n n, be nonsingular U, V R n k. If (I + V T A 1 U) is invertible, then Proof of (1.1.1): (A + UV T ) 1 = A 1 A 1 U(I + V T A 1 U) 1 V T A 1, (A + uv T )[A 1 A 1 uv T A 1 /(1 + v T A 1 u)] 1 = I v T A 1 u [uvt A 1 (1 + v T A 1 u) uv T A 1 uv T A 1 uv T A 1 ] 1 = I v T A 1 u [u(vt A 1 u)v T A 1 uv T A 1 uv T A 1 ] = I. Example A = = B Rank and orthogonality Let A R m n. Then [ ] R(A) = {y R m y = Ax for some x R n } R m is the range space of A. N (A) = {x R n Ax = 0 } R n is the null space of A. rank(a) = dim [R(A)] = The number of maximal linearly independent columns of A. rank(a) = rank(a T ). dim(n (A)) + rank(a) = n. If m = n, then A is nonsingular N (A) = {0} rank(a) = n. Let {x 1,, x p } R n. Then {x 1,, x p } is said to be orthogonal if x T i x j = 0, for i j and orthonormal if x T i x j = δ ij, where δ ij = 0 if i j and δ ij = 1 if i = j. S = {y R m y T x = 0, for x S} = orthogonal complement of S. R n = R(A T ) N (A), R m = R(A) N (A T ). R(A T ) N (A), R(A) = N (A T ).

16 1.1 Mathematical auxiliary, definitions and relations 5 A R n n A C n n Symmetric: A T = A Hermitian: A = A(A H = A) skew-symmetric: A T = A skew-hermitian: A = A positive definite: x T Ax > 0, x 0 positive definite: x Ax > 0, x 0 non-negative definite: x T Ax 0 non-negative definite: x Ax 0 indefinite: (x T Ax)(y T Ay) < 0 for some x, y indefinite: (x Ax)(y Ay) < 0 for some x, y orthogonal: A T A = I n unitary: A A = I n normal: A T A = AA T normal: A A = AA positive: a ij > 0 non-negative: a ij 0. Table 1.1: Some definitions for matrices Special matrices Let A K n n. Then the matrix A is diagonal if a ij = 0, for i j. Denote D = diag(d 1,, d n ) D n the set of diagonal matrices; tridiagonal if a ij = 0, i j > 1; upper bi-diagonal if a ij = 0, i > j or j > i + 1; (strictly) upper triangular if a ij = 0, i > j (i j); upper Hessenberg if a ij = 0, i > j + 1. (Note: the lower case is the same as above.) Sparse matrix: n 1+r, where r < 1 (usually between ). If n = 1000, r = 0.9, then n 1+r = Example If S is skew-symmetric, then I S is nonsingular and (I S) 1 (I + S) is orthogonal (Cayley transformation of S) Eigenvalues and Eigenvectors Definition Let A C n n. Then λ C is called an eigenvalue of A, if there exists x 0, x C n with Ax = λx and x is called an eigenvector corresponding to λ. Notations: σ(a) := Spectrum of A = The set of eigenvalues of A. ρ(a) := Radius of A = max{ λ : λ σ(a)}. λ σ(a) det(a λi) = 0. p(λ) = det(λi A) = The characteristic polynomial of A. p(λ) = s i=1 (λ λ i) m(λ i), where λ i λ j (for i j) and s i=1 m(λ i) = n.

17 6 Chapter 1. Introduction m(λ i ) = The algebraic multiplicity of λ i. n(λ i ) = n rank(a λ i I) = The geometric multiplicity of λ i. 1 n(λ i ) m(λ i ). Definition If there is some i such that n(λ i ) < m(λ i ), then A is called degenerated. The following statements are equivalent: (1) There are n linearly independent eigenvectors; (2) A is diagonalizable, i.e., there is a nonsingular matrix T such that T 1 AT D n ; (3) For each λ σ(a), it holds that m(λ) = n(λ). If A is degenerated, then eigenvectors and principal vectors derive the Jordan form of A. (See Gantmacher: Matrix Theory I, II) Theorem (Schur) (1) Let A C n n. There is a unitary matrix U such that U AU(= U 1 AU) is upper triangular. (2) Let A R n n. There is an orthogonal matrix Q such that Q T AQ(= Q 1 AQ) is quasi-upper triangular, i.e., an upper triangular matrix possibly with nonzero subdiagonal elements in non-consecutive positions. (3) A is normal if and only if there is a unitary U such that U AU = D diagonal. (4) A is Hermitian if and only if A is normal and σ(a) R. (5) A is symmetric if and only if there is an orthogonal U such that U T AU = D diagonal and σ(a) R. 1.2 Norms and eigenvalues Let X be a vector space over K = R or C. Definition (Vector norms) Let N be a real-valued function defined on X (N : X R + ). Then N is a (vector) norm, if N1: N(αx) = α N(x), α K, for x X; N2: N(x + y) N(x) + N(y), for x, y X; N3: N(x) = 0 if and only if x = 0. The usual notation is x = N(x).

18 1.2 Norms and eigenvalues 7 Example Let X = C n, p 1. Then x p = ( n i=1 x i p ) 1/p is a p-norm. Especially, x 1 = x 2 = ( n x i ( 1-norm), i=1 n x i 2 ) 1/2 (2-norm = Euclidean-norm), i=1 x = max 1 i n x i ( -norm = maximum norm). Lemma N(x) is a continuous function in the components x 1,, x n of x. Proof: N(x) N(y) N(x y) n x j y j N(e j ) j=1 n x y N(e j ). j=1 Theorem (Equivalence of norms) Let N and M be two norms on C n. Then there are constants c 1, c 2 > 0 such that c 1 M(x) N(x) c 2 M(x), for all x C n. Proof: Without loss of generality (W.L.O.G.) we can assume that M(x) = x and N is arbitrary. We claim that c 1 x N(x) c 2 x, equivalently, c 1 N(z) c 2, z S = {z C n z = 1}. From Lemma 1.2.1, N is continuous on S (closed and bounded). minimum principle, there are c 1, c 2 0 and z 1, z 2 S such that By maximum and c 1 = N(z 1 ) N(z) N(z 2 ) = c 2. If c 1 = 0, then N(z 1 ) = 0, and thus, z 1 = 0. This contradicts that z 1 S. Remark Theorem does not hold in infinite dimensional space. Definition (Matrix-norms) Let A C m n. A real-valued function : C m n R + satisfying N1: αa = α A ; N2: A + B A + B ;

19 8 Chapter 1. Introduction N3: A = 0 if and only if A = 0; N4: AB A B ; N5: Ax v A x v (matrix and vector norms are compatible for some v ) is called a matrix norm. If satisfies N1 to N4, then it is called a multiplicative or algebra norm. Example (Frobenius norm) Let A F = [ n i,j=1 a i,j 2 ] 1/2. AB F = ( i,j k a ik b kj 2 ) 1 2 ( { a ik 2 }{ b kj 2 }) 1 2 (Cauchy-Schwartz Ineq.) i,j k k = ( a ik 2 ) 1 2 ( b kj 2 ) 1 2 i k j k = A F B F. (1.2.1) This implies that N4 holds. Furthermore, by Cauchy-Schwartz inequality we have Ax 2 = ( i j a ij x j 2 ) 1 2 ( ( a ij 2 )( ) 1 2 x j 2 ) i j j = A F x 2. (1.2.2) This implies that N5 holds. Also, N1, N2 and N3 hold obviously. (Here, I F = n). Example (Operator norm) Given a vector norm. An associated (induced) matrix norm is defined by Then N5 holds immediately. On the other hand, for all x 0. This implies that It holds N4. (Here I = 1). Ax A = sup x 0 x = max Ax x 0 x. (1.2.3) (AB)x = A(Bx) A Bx A B x (1.2.4) AB A B. (1.2.5)

20 1.2 Norms and eigenvalues 9 In the following, we represent and verify three useful matrix norms: Ax 1 n A 1 = sup = max a ij (1.2.6) x 0 x 1 1 j n Ax A = sup x 0 x = max 1 i n i=1 n a ij (1.2.7) j=1 Proof of (1.2.6): Ax 2 A 2 = sup = ρ(a x 0 x A) (1.2.8) 2 Ax 1 = i = j j x j i a ij x j i a ij. a ij x j j Let C 1 := i a ik = max j i a ij. Then Ax 1 C 1 x 1, thus A 1 C 1. On the other hand, e k 1 = 1 and Ae k 1 = n i=1 a ik = C 1. Proof of (1.2.7): Ax = max a ij x j i j max a ij x j i j max a ij x i j j a kj x C x. This implies that A C. If A = 0, there is nothing to prove. Assume that A 0 and the k-th row of A is nonzero. Define z = [z j ] C n by { ākj a z j = kj if a kj 0, 1 if a kj = 0. Then z = 1 and a kj z j = a kj, for j = 1,..., n. It follows that A Az = max a ij z j n a kj z j = a kj C. i j j j=1 Thus, A max n 1 i n j=1 a ij C. Proof of (1.2.8): Let λ 1 λ 2 λ n 0 be the eigenvalues of A A. There are mutually orthonormal vectors v j, j = 1,..., n such that (A A)v j = λ j v j. Let x = j α jv j. Since Ax 2 2 = (Ax, Ax) = (x, A Ax), ( Ax 2 2 = α j v j, ) α j λ j v j = λ j α j 2 λ 1 x 2 2. j j j

21 10 Chapter 1. Introduction Therefore, A 2 2 λ 1. Equality follows by choosing x = v 1 and Av = (v 1, λ 1 v 1 ) = λ 1. So, we have A 2 = ρ(a A). Example (Dual norm) Let 1 p + 1 q = 1. Then p = q, (p =, q = 1). (It concludes from the application of the Hölder inequality that y x x p y q.) Theorem Let A C n n. Then for any operator norm, it holds ρ(a) A. Moreover, for any ε > 0, there exists an operator norm ε such that ε ρ(a) + ε. Proof: Let λ = ρ(a) ρ and x be the associated eigenvector with x = 1. Then, ρ(a) = λ = λx = Ax A x = A. On the other hand, there is a unitary matrix U such that A = U RU, where R is upper triangular. Let D t = diag(t, t 2,..., t n ). Compute λ 1 t 1 r 12 t 2 r 13 t n+1 r 1n λ 2 t 1 r 23 t n+2 r 2n D t RDt 1 = λ t 1 r n 1,n λ n For t > 0 sufficiently large, the sum of all absolute values of the off-diagonal elements of D t RDt 1 is less than ε. So, it holds D t RDt 1 1 ρ(a) + ε for sufficiently large t(ε) > 0. Define ε for any B by This implies that B ε = D t UBU Dt 1 1 = (UDt 1 ) 1 B(UDt 1 ) 1. A ε = D t RD 1 t ρ(a) + ε. Remark UAV F = A F (by UA F = Ua Ua n 2 2), (1.2.9) UAV 2 = A 2 (by ρ(a A) = ρ(aa )), (1.2.10) where U and V are unitary. Theorem (Singular Value Decomposition (SVD)) Let A C m n. Then there exist unitary matrices U = [u 1,, u m ] C m m and V = [v 1,, v n ] C n n such that U AV = diag(σ 1,, σ p ) = Σ, where p = min{m, n} and σ 1 σ 2 σ p 0. (Here, σ i denotes the i-th largest singular value of A).

22 1.2 Norms and eigenvalues 11 Proof: There are x C n, y C m with x 2 = y 2 = 1 such that Ax = σy, where σ = A 2 ( A 2 = sup x 2 =1 Ax 2 ). Let V = [x, V 1 ] C n n, and U = [y, U 1 ] C m m be unitary. Then [ ] σ w A 1 U AV =. 0 B Since A 1 ( σ w ) 2 (σ 2 + w w) 2, 2 it follows that A σ 2 + w w from ( ) σ 2 A 1 w 2 ( ) σ 2 w 2 σ 2 + w w. But σ 2 = A 2 2 = A 1 2 2, it implies w = 0. Hence, the theorem holds by induction. Remark A 2 = ρ(a A) = σ 1 = The maximal singular value of A. Let A = UΣV. Then we have This implies In addition, by (1.2.2) and (1.2.11), we get ABC F = UΣV BC F = ΣV BC F σ 1 BC F = A 2 BC F. ABC F A 2 B F C 2. (1.2.11) A 2 A F n A 2. (1.2.12) Theorem Let A C n n. The following statements are equivalent: (1) lim m Am = 0; (2) lim m Am x = 0 for all x; (3) ρ(a) < 1. Proof: (1) (2): Trivial. (2) (3): Let λ σ(a), i.e., Ax = λx, x 0. This implies A m x = λ m x 0, as λ m 0. Thus λ < 1, i.e., ρ(a) < 1. (3) (1): There is a norm with A < 1 (by Theorem 1.2.2). Therefore, A m A m 0, i.e., A m 0. Theorem It holds that where is an operator norm. ρ(a) = lim k A k 1/k

23 12 Chapter 1. Introduction Proof: Since ρ(a) k = ρ(a k ) A k ρ(a) A k 1/k, for k = 1, 2,.... If ε > 0, then Ã = [ρ(a) + ε] 1 A has spectral radius < 1 and by Theorem we have Ãk 0 as k. There is an N = N(ε, A) such that Ãk < 1 for all k N. Thus, A k [ρ(a) + ε] k, for all k N or A k 1/k ρ(a) + ε for all k N. Since ρ(a) A k 1/k, and k, ε are arbitrary, lim k A k 1/k exists and equals ρ(a). Theorem Let A C n n, and ρ(a) < 1. Then (I A) 1 exists and (I A) 1 = I + A + A 2 +. Proof: Since ρ(a) < 1, the eigenvalues of (I A) are nonzero. Therefore, by Theorem 2.5, (I A) 1 exists and (I A)(I + A + A A m ) = I A m 0. Corollary If A < 1, then (I A) 1 exists and (I A) A Proof: Since ρ(a) A < 1 (by Theorem 1.2.2), (I A) 1 = A i i=0 A i = (1 A ) 1. i=0 Theorem (Without proof) For A K n n the following statements are equivalent: (1) There is a multiplicative norm p with p(a k ) 1, k = 1, 2,.... (2) For each multiplicative norm p the power p(a k ) are uniformly bounded, i.e., there exists a M(p) < such that p(a k ) M(p), k = 0, 1, 2,.... (3) ρ(a) 1 and all eigenvalue λ with λ = 1 are not degenerated. (i.e., m(λ) = n(λ).) (See Householder s book: The theory of matrix, pp ) In the following we prove some important inequalities of vector norms and matrix norms.

24 1.2 Norms and eigenvalues 13 (a) It holds that 1 x p x q n (q p)/pq, (p q). (1.2.13) Proof: Claim x q x p, (p q): It holds x q = x x x p x p = x p q x p C p,q x p, q where C p,q = max e p=1 e q, e = (e 1,, e n ) T. We now show that C p,q 1. From p q, we have e q q = n e i q i=1 n e i p = 1 (by e i 1). i=1 Hence, C p,q 1, thus x q x p. To prove the second inequality: Let α = q/p > 1. Then the Jensen inequality holds for the convex function ϕ(x): ϕ( fdµ) (ϕ f)dµ, µ(ω) = 1. If we take ϕ(x) = x α, then we have Ω Ω f q dx = Ω Ω ( ( f p ) q/p dx f p dx Ω with Ω = 1. Consider the discrete measure n 1 i=1 = 1 and f(i) = x n i. It follows that ( n n x i q 1 n x i p 1 q/p. n) i=1 i=1 Hence, we have Thus, (b) It holds that n 1 q x q n 1 p x p. n (q p)/pq x q x p. Proof: Let q and lim q x q = x : ) q/p 1 x p x n 1 p. (1.2.14) x = x k = ( x k q ) 1 q ( n i=1 x i q ) 1 q = x q.

25 14 Chapter 1. Introduction On the other hand, we have x q = ( n i=1 x i q ) 1 q which implies that lim q x q = x. (c) It holds that where A = [a 1,, a n ] R m n. (n x q ) 1 q n 1 q x max a j p A p n (p 1)/p max a j p, (1.2.15) 1 j n 1 j n Proof: The first inequality holds obviously. y p = 1 we have To show the second inequality, for Ay p n y j a j p j=1 = y 1 max j n j=1 y j max a j p j a j p n (p 1)/p max a j p (by (1.2.13)). j (d) It holds that where A R m n. max i,j Proof: By (1.2.14) and (1.2.15) immediately. a ij A p n (p 1)/p m 1/p max a ij, (1.2.16) i,j (e) It holds that m (1 p)/p A 1 A p n (p 1)/p A 1. (1.2.17) Proof: By (1.2.15) and (1.2.13) immediately. (f) By Hölder inequality, we have (see Appendix later!) y x x p y q, where = 1 or p q max{ x y : y q = 1} = x p. (1.2.18) Then it holds that A p = A T q. (1.2.19) Proof: By (1.2.18) we have max Ax p = max max x p=1 x p=1 y (Ax)T y q=1 = max y q =1 max x p =1 xt (A T y) = max y q =1 AT y q = A T q.

26 1.2 Norms and eigenvalues 15 (g) It holds that n 1 p A A p m 1 p A. (1.2.20) Proof: By (1.2.17) and (1.2.19), we get m 1 p A = m 1 p A T 1 = m 1 1 q A T 1 = m (q 1)/q A T 1 A T q = A p. (h) It holds that A 2 A p A q, ( 1 p + 1 q = 1). (1.2.21) Proof: By (1.2.19) we have A p A q = A T q A q A T A q A T A 2. The last inequality holds by the following statement: Let S be a symmetric matrix. Then S 2 S, for any matrix operator norm. Since λ S, This implies, S 2 S. S 2 = ρ(s S) = ρ(s 2 ) = max λ σ(s) λ = λ max. (i) For A R m n and q p 1, it holds that Proof: By (1.2.13), we get n (p q)/pq A q A p m (q p)/pq A q. (1.2.22) A p = max Ax p max m (q p)/pq Ax q x p=1 x q 1 = m (q p)/pq A q. Appendix: To show Hölder inequality and (1.2.18) Taking ϕ(x) = e x in Jensen s inequality we have { } exp fdµ Ω Ω e f dµ. Let Ω = finite set = {p 1,..., p n }, µ({p i }) = 1, f(p n i) = x i. Then { } 1 exp n (x x n ) 1 n (ex e x n ). Taking y i = e x i, we have (y 1 y n ) 1/n 1 n (y y n ).

27 16 Chapter 1. Introduction Taking µ({p i }) = q i > 0, n i=1 q i = 1 we have y q 1 1 y q n n q 1 y q n y n. (1.2.23) Let α i = x i / x p, β i = y i / y q, where x = [x 1,, x n ] T, y = [y 1,, y n ] T, α = [α 1,, α n ] T and β = [β 1,, β n ] T. By (1.2.23) we have Since α p = 1, β q = 1, it holds α i β i 1 p αp i + 1 q βq i. Thus, n α i β i 1 p + 1 q = 1. i=1 x T y x p y q. To show max{ x T y ; x p = 1} = y q. Taking x i = y q 1 i / y q/p q we have x p p = n i=1 y i (q 1)p y q q = 1. Note (q 1)p = q. Then n n x T i=1 i y i = y i q i=1 y q/p q = y q q y q/p q = y q. The following two properties are useful in the following sections. (i) There exists ẑ with ẑ p = 1 such that y q = ẑ T y. Let z = ẑ/ y q. Then we have z T y = 1 and z p = 1 y q. (ii) From the duality, we have y = ( y ) = max u =1 y T u = y T ẑ and ẑ = 1. Let z = ẑ/ y. Then we have z T y = 1 and z = 1 y. 1.3 The Sensitivity of Linear System Ax = b Backward error and Forward error Let x = F (a). We define backward and forward errors in Figure 1.1. In Figure 1.1, ˆx + x = F (a + a) is called a mixed forward-backward error, where x ε x, a η a. Definition (i) An algorithm is backward stable, if for all a, it produces a computed ˆx with a small backward error, i.e., ˆx = F (a + a) with a small. (ii) An algorithm is numerical stable, if it is stable in the mixed forward-backward error sense, i.e., ˆx + x = F (a + a) with both a and x small.

28 1.3 The Sensitivity of Linear System Ax = b 17 Input Output a x F ( a) = backward error forward error a + D a = + D xˆ F ˆ ( a a) Dx + D F ( a a) Figure 1.1: Relationship between backward and forward errors. (iii) If a method which produces answers with forward errors of similar magnitude to those produced by a backward stable method, is called a forward stable. Remark (i) Backward stable forward stable, not vice versa! (ii) Forward error condition number backward error Consider ˆx x = F (a + a) F (a) = F (a) a + F (a + θ a) ( a) 2, θ (0, 1). 2 Then we have ˆx x x af (a) F (a) = ( ) af (a) a F (a) a + O ( ( a) 2). The quantity C(a) = is called the condition number of F. If x or F is a vector, then the condition number is defined in a similar way using norms and it measures the maximum relative change, which is attained for some, but not all a. { Àpriori error estimate! `Pposteriori error estimate! An SVD Analysis Let A = n i=1 σ iu i v i T = UΣV T be a singular value decomposition (SVD) of A. Then x = A 1 b = (UΣV T ) 1 b = n i=1 u i T b σ i v i. If cos(θ) = u n T b / b 2 and (A εu n v n T )y = b + ε(u n T b)u n, σ n > ε 0.

29 18 Chapter 1. Introduction Then we have y x 2 ( ε σ n ) x 2 cos(θ). Let E = diag{0,, 0, ε}. Then it holds Therefore, (Σ E)V T y = U T b + ε(u n T b)e n. y x = V (Σ E) 1 U T b + ε(u n T b)(σ n ε) 1 v n V Σ 1 U T b = V ((Σ E) 1 Σ 1 )U T b + ε(u n T b)(σ n ε) 1 v n = V (Σ 1 E(Σ E) 1 )U T b + ε(u T n b)(σ n ε) 1 v ( n = V diag 0,, 0, ε σ n (σ n ε) ) U T b + ε(u n T b)(σ n ε) 1 v n ε = σ n (σ n ε) v n(u T n b) + ε(u T n b)(σ n ε) 1 v n = u T ε n bv n ( σ n (σ n ε) + ε(σ n ε) 1 ) = ε(1 + σ n) σ n (σ n ε) u n T bv n. From the inequality x 2 b 2 A 1 2 we have y x 2 x 2 u n T ε b σ n ( 1+σ) σ ε b 2 u n T b ε. b 2 σ n Theorem A is nonsingular and A 1 E = r < 1. Then A + E is nonsingular and (A + E) 1 A 1 E A 1 2 /(1 r). Proof:: Since A is nonsingular, A+E = A(I F ), where F = A 1 E. Since F = r < 1, it follows that I F is nonsingular (by Corollary 1.2.1) and (I F ) 1 < 1 1 r. Then and It follows that (A + E) 1 = (I F ) 1 A 1 = (A + E) 1 A 1 1 r (A + E) 1 A 1 = A 1 E(A + E) 1. (A + E) 1 A 1 A 1 E (A + E) 1 A 1 2 E. 1 r Lemma Let { Ax = b, (A + A)y = b + b, where A δ A and b δ b. If δκ(a) = r < 1, then A+ A is nonsingular and y 1+r, where κ(a) = x 1 r A A 1.

30 1.3 The Sensitivity of Linear System Ax = b 19 Proof: Since A 1 A < δ A 1 A = r < 1, it follows that A + A is nonsingular. From the equality (I + A 1 A)y = x + A 1 b follows that y (I + A 1 A) 1 ( x + δ A 1 b ) 1 1 r ( x + δ A 1 b ) = 1 1 r ( x + r b A ). From b = Ax A x follows the lemma Normwise Forward Error Bound Theorem If the assumption of Lemma holds, then x y x 2δ 1 r κ(a). Proof:: Since y x = A 1 b A 1 Ay, we have So by Lemma it holds y x δ A 1 b + δ A 1 A y. y x x b δκ(a) A x + δκ(a) y x δκ(a)( r 1 r ) = 2δ 1 r κ(a) Componentwise Forward Error Bound Theorem Let Ax = b and (A + A)y = b + b, where A δ A and b δ b. If δκ (A) = r < 1, then (A + A) is nonsingular and y x x 2δ 1 r A 1 A. Here A 1 A is called a Skeel condition number. Proof:: Since A δ A and b δ b, the assumptions of Lemma are satisfied in -norm. So, A + A is nonsingular and y x 1+r. 1 r Since y x = A 1 b A 1 Ay, we have By taking -norm, we have y x A 1 b + A 1 A y δ A 1 b +δ A 1 A y δ A 1 A ( x + y ). y x δ A 1 A ( x r 1 r x ) = 2δ 1 r A 1 A.

31 20 Chapter 1. Introduction Derivation of Condition Number of Ax = b Let (A + εf )x(ε) = b + εf with x(0) = x. Then we have ẋ(0) = A 1 (f F x) and x(ε) = x + εẋ(0) + o(ε 2 ). Therefore, x(ε) x x ε A 1 { f x + F } + o(ε2 ). Define condition number κ(a) := A A 1. Then we have x(ε) x x κ(a)(ρ A + ρ b ) + o(ε 2 ), where ρ A = ε F / A and ρ b = ε f / b Normwise Backward Error Theorem Let y be the computed solution of Ax = b. Then the normwise backward error bound η(y) := min { ε (A + A)y = b + b, A ε A, b ε b } is given by where r = b Ay is the residual. η(y) = r A y + b, (1.3.24) Proof: The right hand side of (1.3.24) is a upper bound of η(y). This upper bound is attained for the perturbation (by construction!) A min = A y rzt A y + b, b b min = A y + b r, where z is the dual vector of y, i.e. z T y = 1 and z = 1 y. Check: A min = η(y) A, or That is, to prove Since ( A min = A y rzt A y + b = rz T = r y. r A y + b ) A. rz T = max u =1 (rzt )u = r max u =1 zt u = r z = r 1 y, we have done. Similarly, b min = η(y) b.

32 1.3 The Sensitivity of Linear System Ax = b Componentwise Backward Error Theorem The componentwise backward error bound is given by ω(y) := min { ε (A + A)y = b + b, A ε A, b ε b } ω(y) = max i where r = b Ay. (note: ξ/0 = 0 if ξ = 0; ξ/0 = if ξ 0.) r i (A y + b) i, (1.3.25) Proof: The right hand side of (1.3.25) is a upper bound for ω(y). This bound is attained for the perturbations A = D 1 AD 2 and b = D 1 b, where D 1 = diag(r i /(A y + b) i ) and D 2 = diag(sign(y i )). Remark Theorems and are posterior error estimation approach Determinants and Nearness to Singularity B n = , 0 1 B 1 n = n Then det(b n ) = 1, κ (B n ) = n2 n 1, σ 30 (B 30 ) D n = Then det(d n ) = 10 n, κ p (D n ) = 1 and σ n (D n ) =

33 22 Chapter 1. Introduction

34 Chapter 2 Numerical methods for solving linear systems Let A C n n be a nonsingular matrix. We want to solve the linear system Ax = b by (a) Direct methods (finite steps); Iterative methods (convergence). (See Chapter 4) 2.1 Elementary matrices x 1 ȳ 1 x 1 ȳ n Let X = K n and x, y X. Then y x K, xy =... The eigenvalues x n ȳ 1 x n ȳ n of xy are {0,, 0, y x}, since rank(xy ) = 1 by (xy )z = (y z)x and (xy )x = (y x)x. Definition A matrix of the form is called an elementary matrix. I αxy (α K, x, y K n ) (2.1.1) The eigenvalues of (I αxy ) are {1, 1,, 1, 1 αy x}. Compute If αy x 1 0 and letβ = (I αxy )(I βxy ) = I (α + β αβy x)xy. (2.1.2) α, then α + β αy x 1 αβy x = 0. We have (I αxy ) 1 = (I βxy ), 1 α + 1 β = y x. (2.1.3) Example Let x K n, and x x = 1. Let H = {z : z x = 0} and Q = I 2xx (Q = Q, Q 1 = Q). Then Q reflects each vector with respect to the hyperplane H. Let y = αx + w, w H. Then, we have Qy = αqx + Qw = αx + w 2(x w)x = αx + w.

35 24 Chapter 2. Numerical methods for solving linear systems Example Let y = e i = the i-th column of unit matrix and x = l i = [0,, 0, l i+1,i,, l n,i ] T. Then, 1... I + l i e T 1 i = (2.1.4) l i+1,i.... l n,i 1 Since e T i l i = 0, we have From the equality (I + l i e T i ) 1 = (I l i e T i ). (2.1.5) follows that (I + l 1 e T 1 )(I + l 2 e T 2 ) = I + l 1 e T 1 + l 2 e T 2 + l 1 (e T 1 l 2 )e T 2 = I + l 1 e T 1 + l 2 e T 2 (I + l 1 e T 1 ) (I + l i e T i ) (I + l n 1 e T n 1) = I + l 1 e T 1 + l 2 e T l n 1 e T n 1 1 l = (2.1.6) l n1 l n,n 1 1 Theorem A lower triangular with 1 on the diagonal can be written as the product of n 1 elementary matrices of the form (2.1.4). Remark (I + l 1 e T l n 1 e T n 1) 1 = (l l n 1 e T n 1)... (I l 1 e T 1 ) which can not be simplified as in (2.1.6). 2.2 LR-factorization Definition Given A C n n, a lower triangular matrix L and an upper triangular matrix R. If A = LR, then the product LR is called a LR-factorization (or LRdecomposition) of A. Basic problem: Given b 0, b K n. Find a vector l 1 = [0, l 21,..., l n1 ] T and c K such that Solution: (I l 1 e T 1 )b = ce 1. { b1 = c, b i l i1 b 1 = 0, i = 2,..., n. { b1 = 0, it has no solution (since b 0), b 1 0, then c = b 1, l i1 = b i /b 1, i = 2,..., n.

36 2.2 LR-factorization 25 Construction of LR-factorization: Let A = A (0) = [a (0) 1... a (0) n ]. Apply basic problem to a (0) 1 : If a (0) 11 0, then there exists L 1 = I l 1 e T 1 such that (I l 1 e T 1 )a (0) 1 = a (0) 11 e 1. Thus The i-th step: A (1) = L 1 A (0) = [L 1 a (0) 1... L 1 a (0) n ] = a (0) 11 a (0) a (0) 1n 0 a (1) 22 a (1) 2n.... (2.2.1) 0 a (1) n2... a (1) nn A (i) = L i A (i 1) = L i L i 1... L 1 A (0) a (0) 11 a (0) 1n 0 a (1) 22 a (1) 2n =.. a (i 1) ii a (i 1) in.. 0 a (i) i+1,i+1 a (i) i+1,n a (i) n,i+1 a (i) nn (2.2.2) If a (i 1) ii 0, for i = 1,..., n 1, then the method is executable and we have that A (n 1) = L n 1... L 1 A (0) = R (2.2.3) is an upper triangular matrix. Thus, A = LR. Explicit representation of L: L i = I l i e T i, L 1 i = I + l i e T i L = L L 1 n 1 = (I + l 1 e T 1 )... (I + l n 1 e T n 1) = I + l 1 e T l n 1 e T n 1 (by (2.1.6)). Theorem Let A be nonsingular. Then A has an LR-factorization (A=LR) if and only if k i := det(a i ) 0, where A i is the leading principal matrix of A, i.e., a a 1i A i =.., a i1... a ii for i = 1,..., n 1. Proof: (Necessity ): Since A = LR, we have a a 1i l 11.. =.... O r 11 r 1i O... r ii. a i1... a ii l i1... l ii

37 26 Chapter 2. Numerical methods for solving linear systems From det(a) 0 follows that det(l) 0 and det(r) 0. Thus, l jj 0 and r jj 0, for j = 1,..., n. Hence k i = l l ii r r ii 0. (Sufficiency ): From (2.2.2) we have A (0) = (L L 1 i )A (i). Consider the (i + 1)-th leading principle determinant. From (2.2.3) we have a a i,i+1.. a i+1... a i+1,i+1 = 1 0. l l i+1,1 l i+1,i 1 0. Therefore, the LR- Thus, k i = 1 a (0) 11 a (1) a (i) i+1,i+1 factorization of A exists. a (0) 11 a (0) 12 a (1) a (i 1) ii.. a (i 1) i,i+1 0 a (i) i+1,i+1 0 which implies a(i) i+1,i+1. Theorem If a nonsingular matrix A has an LR-factorization with A = LR and l 11 = = l nn = 1, then the factorization is unique. Proof: Let A = L 1 R 1 = L 2 R 2. Then L 1 2 L 1 = R 2 R 1 1 = I. Corollary If a nonsingular matrix A has an LR-factorization with A = LDR, where D is diagonal, L and R T are unit lower triangular (with one on the diagonal) if and only if k i 0. Theorem Let A be a nonsingular matrix. Then there exists a permutation P, such that P A has an LR-factorization. (Proof): By construction! Consider (2.2.2): There is a permutation P i, which interchanges the i-th row with a row of index large than i, such that 0 a (i 1) ii ( P i A (i 1) ). This procedure is executable, for i = 1,..., n 1. So we have L n 1 P n 1... L i P i... L 1 P 1 A (0) = R. (2.2.4) Let P be a permutation which affects only elements i + 1,, n. It holds P (I l i e T i )P 1 = I (P l i )e T i = I l i e T i = L i, (e T i P 1 = e T i ) where L i is lower triangular. Hence we have Now write all P i in (2.2.4) to the right as P L i = L i P. (2.2.5) L n 1 Ln 2... L 1 P n 1... P 1 A (0) = R. Then we have P A = LR with L 1 = L n 1 Ln 2 L 1 and P = P n 1 P 1.

38 2.3 Gaussian elimination Gaussian elimination Practical implementation Given a linear system Ax = b (2.3.1) with A nonsingular. We first assume that A has an LR-factorization. i.e., A = LR. Thus LRx = b. We then (i) solve Ly = b; (ii) solve Rx = y. These imply that LRx = Ly = b. From (2.2.4), we have L n 1... L 2 L 1 (A b) = (R L 1 b). Algorithm (without permutation) For k = 1,..., n 1, if a kk = 0 then stop ( ); else ω j := a kj (j = k + 1,..., n); for i = k + 1,..., n, η := a ik /a kk, a ik := η; for j = k + 1,..., n, a ij := a ij ηω j, b j := b j ηb k. For x: (back substitution!) x n = b n /a nn ; for i = n 1, n 2,..., 1, x i = (b i n j=i+1 a ijx j )/a ii. Cost of computation (one multiplication + one addition one flop): (i) LR-factorization: n 3 /3 n/3 flops; (ii) Computation of y: n(n 1)/2 flops; (iii) Computation of x: n(n + 1)/2 flops. For A 1 : 4/3n 3 n 3 /3 + kn 2 (k = n linear systems). Pivoting: (a) Partial pivoting; (b) Complete pivoting. From (2.2.2), we have A (k 1) = 11 a (0) 1n a (k 2) k 1,k 1 a (k 2) k 1,n.. 0 a (k 1) kk a (k 1) kn a (k 1) nk a (k 1) nn a (0)

39 28 Chapter 2. Numerical methods for solving linear systems For (a): Find a p {k,..., n} such that a pk = max k i n a ik (r k = p) (2.3.2) swap a kj, b k and a pj, b p respectively, (j = 1,..., n). Replacing ( ) in Algorithm by (2.3.2), we have a new factorization of A with partial pivoting, i.e., P A = LR (by Theorem 2.2.1) and l ij 1 for i, j = 1,..., n. For solving linear system Ax = b, we use P Ax = P b L(Rx) = P T b b. It needs extra n(n 1)/2 comparisons. For (b): Find p, q {k,..., n} such that a pq max a ij, (r k := p, c k := q) k i,j n swap a kj, b k and a pj, b p respectively, (j = k,..., n), swap a ik and a iq (i = 1,..., n). (2.3.3) Replacing ( ) in Algorithm by (2.3.3), we also have a new factorization of A with complete pivoting, i.e., P AΠ = LR (by Theorem 2.2.1) and l ij 1, for i, j = 1,..., n. For solving linear system Ax = b, we use P AΠ(Π T x) = P b LR x = b x = Π x. It needs n 3 /3 comparisons. [ ] Example Let A = be in three decimal-digit floating point arithmetic. 1 1 Then κ(a) = A A 1 4. A is well-conditioned. Without pivoting: [ ] 1 0 L = fl(1/10 4, fl(1/10 4 ) = 10 4, ) 1 [ ] R = 0 fl(1 10 4, fl( ) = ) [ ] [ ] [ ] [ ] LR = = = A [ ] 1 Here a 22 entirely lost from computation. It is numerically unstable. Let Ax =. [ ] [ ] Then x. But Ly = solves y = 1 and y 2 = fl( ) = 10 4, Rˆx = y solves ˆx 2 = fl(( 10 4 )/( 10 4 )) = 1, ˆx 1 = fl((1 1)/10 4 ) = 0. We have an erroneous solution with cond(l), cond(r) Partial pivoting: L = R = L and R are both well-conditioned. [ ] [ 1 0 fl(10 4 = /1) 1 [ ] fl( = ) [ ]. ],

40 2.3 Gaussian elimination LDR- and LL T -factorizations Let A = LDR as in Corollary Algorithm (Crout s factorization or compact method) For k = 1,..., n, for p = 1, 2,..., k 1, r p := d p a pk, ω p := a kp d p, d k := a kk k 1 p=1 a kpr p, if d k = 0, then stop; else for i = k + 1,..., n, a ik := (a ik k 1 p=1 a ipr p )/d k, a ki := (a ki k 1 p=1 ω pa pi )/d k. Cost: n 3 /3 flops. With partial pivoting: see Wilkinson EVP pp Advantage: One can use double precision for inner product. Theorem If A is nonsingular, real and symmetric, then A has a unique LDL T - factorization, where D is diagonal and L is a unit lower triangular matrix (with one on the diagonal). Proof: A = LDR = A T = R T DL T. It implies L = R T. Theorem If A is symmetric and positive definite, then there exists a lower triangular G R n n with positive diagonal elements such that A = GG T. Proof: A is symmetric positive definite x T Ax 0, for all nonzero vector x R n n k i 0, for i = 1,, n, all eigenvalues of A are positive. From Corollary and Theorem we have A = LDL T. From L 1 AL T = D follows that d k = (e T k L 1 )A(L T e k ) > 0. Thus, G = Ldiag{d 1/2 1,, d 1/2 n } is real, and then A = GG T. Algorithm (Cholesky factorization) Let A be symmetric positive definite. To find a lower triangular matrix G such that A = GG T. For k = 1, 2,..., n, a kk := (a kk k 1 p=1 a2 kp )1/2 ; for i = k + 1,..., n, a ik = (a ik k 1 p=1 a ipa kp )/a kk. Cost: n 3 /6 flops. Remark For solving symmetric, indefinite systems: See Golub/ Van Loan Matrix Computation pp P AP T = LDL T, D is 1 1 or 2 2 block-diagonal matrix, P is a permutation and L is lower triangular with one on the diagonal.

41 30 Chapter 2. Numerical methods for solving linear systems Error estimation for linear systems Consider the linear system and the perturbed linear system Ax = b, (2.3.4) (A + δa)(x + δx) = b + δb, (2.3.5) where δa and δb are errors of measure or round-off in factorization. Definition Let be an operator norm and A be nonsingular. Then κ κ(a) = A A 1 is a condition number of A corresponding to. Theorem (Forward error bound) Let x be the solution of the (2.3.4) and x+δx be the solution of the perturbed linear system (2.3.5). If δa A 1 < 1, then ( δx x κ δa 1 κ δa A + δb ). (2.3.6) b A Proof: From (2.3.5) we have Thus, (A + δa)δx + Ax + δax = b + δb. δx = (A + δa) 1 [(δa)x δb]. (2.3.7) Here, Corollary 2.7 implies that (A + δa) 1 exists. Now, (A + δa) 1 = (I + A 1 δa) 1 A 1 A A 1 δa. On the other hand, b = Ax implies b A x. So, 1 x A b. (2.3.8) From (2.3.7) follows that δx inequality (2.3.6) is proved. A 1 ( δa x + δb ). By using (2.3.8), the 1 A 1 δa Remark If κ(a) is large, then A (for the linear system Ax = b) is called illconditioned, else well-conditioned Error analysis for Gaussian algorithm A computer in characterized by four integers: (a) the machine base β; (b) the precision t; (c) the underflow limit L; (d) the overflow limit U. Define the set of floating point numbers. F = {f = ±0.d 1 d 2 d t β e 0 d i < β, d 1 0, L e U} {0}. (2.3.9)

42 2.3 Gaussian elimination 31 Let G = {x R m x M} {0}, where m = β L 1 and M = β U (1 β t ) are the minimal and maximal numbers of F \ {0} in absolute value, respectively. We define an operator fl : G F by One can show that fl satisfies fl(x) = the nearest c F to x by rounding arithmetic. fl(x) = x(1 + ε), ε eps, (2.3.10) where eps = 1 2 β1 t. (If β = 2, then eps = 2 t ). It follows that or where ε eps and = +,,, /. fl(a b) = (a b)(1 + ε) fl(a b) = (a b)/(1 + ε), Algorithm Given x, y R n. The following algorithm computes x T y and stores the result in s. s = 0, for k = 1,..., n, s = s + x k y k. Theorem If n2 t 0.01, then n fl( x k y k ) = k=1 n x k y k [ (n + 2 k)θ k 2 t ], θ k 1 k=1 Proof: Let s p = fl( p k=1 x ky k ) be the partial sum in Algorithm Then s 1 = x 1 y 1 (1 + δ 1 ) with δ 1 eps and for p = 2,..., n, s p = fl[s p 1 + fl(x p y p )] = [s p 1 + x p y p (1 + δ p )](1 + ε p ) with δ p, ε p eps. Therefore fl(x T y) = s n = n x k y k (1 + γ k ), k=1 where (1 + γ k ) = (1 + δ k ) n j=k (1 + ε j), and ε 1 0. Thus, n fl( x k y k ) = k=1 n x k y k [ (n + 2 k)θ k 2 t ]. (2.3.11) k=1 The result follows immediately from the following useful Lemma.

43 32 Chapter 2. Numerical methods for solving linear systems Lemma If (1 + α) = n k=1 (1 + α k), where α k 2 t and n2 t 0.01, then n (1 + α k ) = nθ2 t with θ 1. k=1 Proof: From assumption it is easily seen that (1 2 t ) n n (1 + α k ) (1 + 2 t ) n. (2.3.12) Expanding the Taylor expression of (1 x) n as 1 < x < 1, we get Hence (1 x) n = 1 nx + k=1 Now, we estimate the upper bound of (1 + 2 t ) n : If 0 x 0.01, then n(n 1) (1 θx) n 2 x 2 1 nx. 2 (1 2 t ) n 1 n2 t. (2.3.13) e x = 1 + x + x2 2! + x3 3! + = 1 + x + x 2 x(1 + x 3 + 2x2 + ). 4! 1 + x e x 1 + x x 1 2 ex x (2.3.14) (Here, we use the fact e 0.01 < 2 to the last inequality.) Let x = 2 t. Then the left inequality of (2.3.14) implies (1 + 2 t ) n e 2 t n (2.3.15) Let x = 2 t n. Then the second inequality of (2.3.14) implies From (2.3.15) and (2.3.16) we have e 2 tn n2 t (2.3.16) (1 + 2 t ) n n2 t. Let the exact LR-factorization of A be L and R (A = LR) and let L, R be the LR-factorization of A by using Gaussian Algorithm (without pivoting). There are two possibilities: (i) (ii) Forward error analysis: Estimate L L and R R. Backward error analysis: Let L R be the exact LR-factorization of a perturbed matrix Ã = A + F. Then F will be estimated, i.e., F?.

44 2.3 Gaussian elimination Àpriori error estimate for backward error bound of LRfactorization From (2.2.2) we have a (k+1) ij = A (k+1) = L k A (k), for k = 1, 2,..., n 1 (A (1) = A). Denote the entries of A (k) by a (k) ij and let l ik = fl(a (k) ik /a(k) kk ), i k + 1. From (2.2.2) we know that 0; for i k + 1, j = k fl(a (k) ij a (k) fl(l ik a (k) kj )); for i k + 1, j k + 1 (2.3.17) ij ; otherwise. From (2.3.10) we have l ik = (a (k) ik /a(k) kk )(1 + δ ik) with δ ik 2 t. Then a (k) ik Let a (k) ik δ ik ε (k) ik. From (2.3.10) we also have with δ ij, δ ij 2 t. Then a (k+1) ij = a (k) ij l ika (k) kk + a(k) ij δ ik = 0, for i k + 1. (2.3.18) a (k+1) ij = fl(a (k) ij l ik a (k) kj = (a (k) ij Let ε (k) ij l ik a (k) kj δ ij + a (k+1) ij (2.3.17), (2.3.18) and (2.3.20) we obtain where a (k+1) ij = ε (k) ij = a (k) ij a (k) ij a (k) ij fl(l ik a (k) kj )) (2.3.19) (l ik a (k) kj (1 + δ ij)))/(1 + δ ij) l ika (k) kj δ ij + a (k+1) ij δ ij, for i, j k + 1. (2.3.20) δ ij which is the computational error of a (k) ij in A (k+1). From l ik a (k) kk + ε(k) ij ; for i k + 1, j = k l ik a (k) kj + ε(k) ij ; for i k + 1, j k + 1 (2.3.21) + ε (k) ij ; otherwise, a (k) ij δ ij; for i k + 1, j = k, l ik a (k) kj δ ij a (k+1) ij δ ij; for i k + 1, j k + 1 0; otherwise. Let E (k) be the error matrix with entries ε (k) ij. Then (2.3.21) can be written as where (2.3.22) A (k+1) = A (k) M k A (k) + E (k), (2.3.23) 0 M k =... 0 l k+1,k.... l n,k 0 (2.3.24)

45 34 Chapter 2. Numerical methods for solving linear systems For k = 1, 2..., n 1, we add the n 1 equations in (2.3.23) together and get M 1 A (1) + M 2 A (2) + + M n 1 A (n 1) + A (n) = A (1) + E (1) + + E (n 1). From (2.3.17) we know that the k-th row of A (k) is equal to the k-th row of A (k+1),, A (n), respectively and from (2.3.24) we also have Thus, Then where M k A (k) = M k A (n) = M k R. (M 1 + M M n 1 + I) R = A (1) + E (1) + + E (n 1). L R = A + E, (2.3.25) 1 l 21 1 O L =.... and E = E(1) + + E (n 1). (2.3.26) l n1... l n,n 1 1 Now we assume that the partial pivotings in Gaussian Elimination are already arranged such that pivot element a (k) kk has the maximal absolute value. So, we have l ik 1. Let ρ = max i,j,k a(k) ij / A. (2.3.27) Then From (2.3.22) and (2.3.28) follows that Therefore, ε (k) From (2.3.26) we get ij ρ A a (k) ij ρ A. (2.3.28) 2 t ; for i k + 1, j = k, 2 1 t ; for i k + 1, j k + 1, 0; otherwise. (2.3.29) E (k) ρ A 2 t (2.3.30) E ρ A 2 t. Hence we have the following theorem n 4 2n n 3 2n 2 (2.3.31)

46 2.3 Gaussian elimination 35 Theorem The LR-factorization L and R of A using Gaussian Elimination with partial pivoting satisfies L R = A + E, where Proof: E n 2 ρ A 2 t (2.3.32) n E ρ A 2 t ( (2j 1) 1) < n 2 ρ A 2 t. j=1 Now we shall solve the linear system Ax = b by using the factorization L and R, i.e., Ly = b and Rx = y. For Ly = b: From Algorithm we have y 1 = fl(b 1 /l 11 ), ( ) li1 y 1 l i2 y 2 l i,i 1 y i 1 + b i y i = fl, (2.3.33) l ii for i = 2, 3,..., n. From (2.3.10) we have y 1 = b 1 /l 11 (1 + δ 11 ), with δ 11 2 t y i = fl( fl( l i1y 1 l i2 y 2 l i,i 1 y i 1 )+b i l ii (1+δ ii ) ) (2.3.34) = fl( l i1y 1 l i2 y 2 l i,i 1 y i 1 )+b i l ii (1+δ ii )(1+δ ii ), with δ ii, δ ii 2 t. Applying Theorem we get where fl( l i1 y 1 l i2 y 2 l i,i 1 y i 1 ) = l i1 (1 + δ i1 )y 1 l i,i 1 (1 + δ i,i 1 )y i 1, So, (2.3.34) can be written as δ i1 (i 1) t ; for i = 2, 3,, n, { i = 2, 3,, n, δ ij (i + 1 j) t ; for j = 2, 3,, i 1. l 11 (1 + δ 11 )y 1 = b 1, l i1 (1 + δ i1 )y l i,i 1 (1 + δ i,i 1 )y i 1 + l ii (1 + δ ii )(1 + δ ii)y i = b i, for i = 2, 3,, n. (2.3.35) (2.3.36) or (L + δl)y = b. (2.3.37)

47 36 Chapter 2. Numerical methods for solving linear systems From (2.3.35) (2.3.36) and (2.3.37) follow that l 11 0 l 21 2 l 22 δl t 2 l 31 2 l 32 2 l l 41 3 l 42 2 l (n 1) l n1 (n 1) l n2 (n 2) l n3 2 l n,n 1 2 l nn (2.3.38) This implies, δl n(n + 1) t max l ij i,j n(n + 1) t. (2.3.39) Theorem For lower triangular linear system Ly = b, if y is the exact solution of (L + δl)y = b, then δl satisfies (2.3.38) and (2.3.39). Applying Theorem to the linear system Ly = b and Rx = y, respectively, the solution x satisfies ( L + δ L)( R + δ R)x = b or ( L R + (δ L) R + L(δ R) + (δ L)(δ R))x = b. (2.3.40) Since L R = A + E, substituting this equation into (2.3.40) we get The entries of L and R satisfy [A + E + (δ L) R + L(δ R) + (δ L)(δ R)]x = b. (2.3.41) l ij 1, and r ij ρ A. Therefore, we get L n, R nρ A, δ L n(n+1) t, δ R n(n+1) ρ2 t. In practical implementation we usually have n 2 2 t << 1. So it holds (2.3.42) Let Then, (2.3.32) and (2.3.42) we get δ L δ R n 2 ρ A 2 t. δa = E + (δ L) R + L(δ R) + (δ L)(δ R). (2.3.43) δa E + δ L R + L δ R + δ L δ R 1.01(n 3 + 3n 2 )ρ A 2 t (2.3.44)

Gaussian Elimination for Linear Systems

Gaussian Elimination for Linear Systems Tsung-Ming Huang Department of Mathematics National Taiwan Normal University October 3, 2011 1/56 Outline 1 Elementary matrices 2 LR-factorization 3 Gaussian elimination