Vector Norms. Chapter 7 Iterative Techniques in Matrix Algebra. Cauchy-Bunyakovsky-Schwarz Inequality for Sums. Distances. Convergence.

Vector Norms Chapter 7 Iteratve Technques n Matrx Algebra Per-Olof Persson persson@berkeley.edu Department of Mathematcs Unversty of Calforna, Berkeley Math 128B Numercal Analyss Defnton A vector norm on R n s a functon,, from R n nto R wth the propertes: () x 0 for all x R n () x = 0 f and only f x = 0 () αx = α x for all α R and x R n (v) x + y x + y for all x, y R n Defnton The Eucldean norm l 2 and the nfnty norm l for the vector x = (x 1, x 2,..., x n ) t are defned by x 2 = { =1 x 2 } 1/2 and x = max 1 n x Cauchy-Bunyakovsky-Schwarz Inequalty for Sums Dstances For each x = (x 1, x 2,..., x n ) t and y = (y 1, y 2,..., y n ) t n R n, x t y = { x y x 2 =1 =1 =1 } 1/2 { } 1/2 y 2 = x 2 y 2 Defnton The dstance between two vectors x = (x 1,..., x n ) t and y = (y 1,..., y n ) t s the norm of the dfference of the vectors. The l 2 and l dstances are { } 1/2 x y 2 = (x y ) 2 =1 x y = max 1 n x y Convergence Matrx Norms Defnton A sequence { k=1 of vectors n Rn s sad to converge to x wth respect to the norm f, gven any ε > 0, there exsts an nteger N(ε) such that x < ε, for all k N(ε) The sequence of vectors { } converges to x n R n wth respect to f and only f lm k = x. For each x R n, Defnton A matrx norm on n n matrces s a real-valued functon satsfyng () A 0 () A = 0, f and only f A = 0 () αa = α A (v) A + B A + B (v) AB A B x x 2 n x

Natural Matrx Norms If s a vector norm, the natural (or nduced) matrx norm s gven by Corollary A = max x =1 Ax For any vector z 0, matrx A, and natural norm, Az A z If A = (a ) s an n n matrx, then A = max 1 n =1 a Egenvalues and Egenvectors Defnton The characterstc polynomal of a square matrx A s p(λ) = det(a λi) Defnton The zeros λ of the characterstc polynomal are egenvalues of A, x 0 satsfyng (A λi)x = 0 s a correspondng egenvector. Defnton The spectral radus ρ(a) of a matrx A s ρ(a) = max λ, If A s an n n matrx, then () A 2 = [ρ(a t A)] 1/2 () ρ(a) A, for any natural norm for egenvalues λ of A Convergent Matrces Iteratve Methods for Lnear Systems Defnton An n n matrx A s convergent f lm k (Ak ) = 0, for each = 1, 2,..., n and = 1, 2,..., n The followng statements are equvalent. () A s a convergent matrx () lm n A n = 0, for some natural norm () lm n A n = 0, for all natural norms (v) ρ(a) < 1 (v) lm n A n x = 0, for every x Drect methods for solvng Ax = b, e.g. Gaussan elmnaton, compute an exact soluton after a fnte number of steps (n exact arthmetc) Iteratve algorthms produce a sequence of approxmatons x (1), x (2),... whch hopefully converges to the soluton, and may requre less memory than drect methods may be faster than drect methods may handle specal structures (such as sparsty) n a smpler way Resdual r = bax 10 0 10 5 10 10 10 15 Iteratve Drect 0 5 10 15 20 25 30 Iteraton Two Classes of Iteratve Methods Jacob s Method Statonary methods (or classcal teratve methods) fnds a splttng A = M K and terates = M 1 (Kx (k1) + b) = T x (k1) + c Jacob, Gauss-Sedel, Successve Overrelaxaton (SOR) Krylov subspace methods use only multplcaton by A (and possbly by A T ) and fnd solutons n the Krylov subspace {b, Ab, A 2 b,..., A k1 b} Conugate Gradent (CG), Generalzed Mnmal Resdual (GMRES), BConugate Gradent (BCG), etc An teratve technque to solve Ax = b starts wth an ntal approxmaton x (0) and generates a sequence of vectors { } k=0 that converges to x. Jacob s Method Solve for x n the the th equaton of Ax = b: x = =1 Ths leads to the teraton = 1 ( a =1 ( a ) x + b, a a a x (k1) ) + b, for = 1, 2,..., n for = 1, 2,..., n

Matrx form of Jacob s Method Convert Ax = b nto an equvalent system x = T x + c, select ntal vector x (0) and terate = T x (k1) + c For Jacob s method, splt A nto dagonal and off-dagonal parts: a11 a12 a1n a11 0 0 0 0 0 a12 a1n a21 a22 a2n..... = 0 a22...... a21......................... 0.......... an1,n an1 an2 ann 0 0 ann an1 a n,n1 0 0 0 }{{}}{{}}{{}}{{} A D L U Ths transforms Ax = (D L U)x = b nto Dx = (L + U)x + b, and f D 1 exsts, ths leads to the Jacob teraton: = D 1 (L + U)x (k1) + D 1 b = T x (k1) + c where T = D 1 (L + U) and c = D 1 b The Gauss-Sedel Method The Gauss-Sedel Method Improve Jacob s method by, for > 1, usng the already updated components 1,..., x(k) 1 when computng x(k) : = 1 1 (a ) (a x (k1) ) + b a =1 =+1 In matrx form, the method can be wrtten (D L) = Ux (k1) + b and f (D L) 1 exsts, ths leads to the Gauss-Sedel teraton = (D L) 1 Ux (k1) + (D L) 1 b = T g x (k1) + c g where T g = (D L) 1 U and c g = (D L) 1 b General Iteraton Methods Lemma If the spectral radus satsfes ρ(t ) < 1, then (I T ) 1 exsts, and (I T ) 1 = I + T + T 2 + = For any x (0) R n, the sequence = T x (k1) + c =0 converges to the unque soluton of x = T x + c f and only f ρ(t ) < 1. T General Iteraton Methods Corollary If T < 1 for any natural matrx norm, then = T x (k1) + c converges for any x (0) R n to a vector x R n s.t. x = T x + c. The followng error estmates hold: 1 x T k x (0) x 2 x T k 1 T x(1) x (0) A strctly dagonally domnant = Jacob and Gauss-Sedel converges for any x (0). (Sten-Rosenberg) If a > 0 for all and a < 0 for, then one and only one of the followng holds: () 0 ρ(t g ) < ρ(t ) < 1 () 1 < ρ(t ) < ρ(t g ) () ρ(t ) = ρ(t g ) = 0 (v) ρ(t ) = ρ(t g ) = 1 The Resdual Vector Defnton The resdual vector for x R n wth respect to the lnear system Ax = b s r = b A x. Consder the approxmate soluton vector n Gauss-Sedel: wth resdual vector = ( 1, x(k) 2,..., x(k) r (k) The Gauss-Sedel method: = 1 1 b a can then be wrtten as 1, x(k1) = (r (k) 1, r(k) 2,..., r(k) n )t =1 a = x (k1) =+1 + r(k) a,..., x (k1) n ) t a x (k1) Successve Over-Relaxaton The relaxaton methods uses an teraton of the form = x (k1) + ω r(k) a for some postve ω. Wth ω > 1, they can accelerate the convergence of the Gauss-Sedel method, and are called successve over-relaxaton (SOR) methods. Wrte the SOR method as = (1 ω)x (k1) + ω a 1 b =1 whch can be wrtten n the matrx form a = T ω x (k1) + c ω where T ω = (D ωl) 1 [(1 ω)d + ωu] and c ω = ω(d ωl) 1 b. =+1 a x (k1)

Convergence of the SOR Method (Kahan) If a 0 for all, then ρ(t ω ) ω 1 and the SOR method can converge only f 0 < ω < 2. (Ostrowsk-Rech) If A s PD and 0 < ω < 2, then SOR converges for any x (0). If A s PD and trdagonal, then ρ(t g ) = [ρ(t )] 2 < 1, and the optmal ω for SOR s whch gves ρ(t ω ) = ω 1. 2 ω = 1 + 1 [ρ(t )] 2 Error Bounds Suppose Ax = b, A s nonsngular, x x, and r = b A x. Then for any natural norm, and f x, b 0, Defnton x x r A 1 x x x A A 1 r b The condton number of nonsngular matrx A n the norm s K(A) = A A 1 In terms of K(A), the error bounds can be wrtten: x x K(A) r A, x x K(A) r x b Iteratve Refnement Errors n both matrx and rght-hand sde Algorthm: Iteratve Refnement Solve Ax (1) = b r (k) = b A Solve Ay (k) = r (k) x (k+1) = + y (k) resdual compute accurately! solve for correcton mprove soluton Allows for errors n the soluton of the lnear systems, provded the resdual r s computed accurately Suppose A s nonsngular and The soluton x to δa < 1 A 1 (A + δa) x = b + δb approxmates the soluton x of Ax = b wth the error estmate ( x x K(A) A δb x A K(A) δa b + δa ) A Inner products Krylov Subspace Algorthms Defnton The nner product for n-dmensonal vectors x, y s x, y = x t y For any vectors x, y, z and real number α: (a) x, y = y, x (b) αx, y = x, αy = α x, y (c) x + z, y = x, y + z, y (d) x, x 0 (e) x, x = 0 x = 0 Create a sequence of Krylov subspaces for Ax = b: K k = {b, Ab,..., A k1 b} and fnd approxmate solutons x k n K k Only matrx-vector products nvolved For SPD matrces, the most popular algorthm s the Conugate Gradents method [Hestenes/Stefel, 1952] Fnds the best soluton x k K k n the norm x A = x t Ax Only requres storage of 4 vectors (not all the k vectors n K k ) Remarkably smple and excellent convergence propertes Orgnally nvented as a drect algorthm! (converges after n steps n exact arthmetc)

The Conugate Gradents Method Propertes of Conugate Gradents Vectors Algorthm: Conugate Gradents Method x 0 = 0, r 0 = b, p 0 = r 0 α k = (rk1 t r k1)/(p t k1 Ap k1) x k = x k1 + α k p k1 r k = r k1 α k Ap k1 β k = (rk t r k)/(rk1 t r k1) p k = r k + β k p k1 step length approxmate soluton resdual mprovement ths step search drecton Only one matrx-vector product Ap k1 per teraton Operaton count O(n) (excludng the matrx-vector product) The spaces spanned by the solutons, the search drectons, and the resduals are all equal to the Krylov subspaces: K k = span ({x 1, x 2,..., x k }) = span ({p 0, p 1,..., p k1 }) ({ }) = span ({r 0, r 1,..., r k1 }) = span b, Ab,..., A k1 b The resduals are orthogonal: r t k r = 0 ( < k) The search drectons are A-conugate: p t k Ap = 0 ( < k) Optmalty of Conugate Gradents The errors e k = x x k are mnmzed n the A-norm Proof. For any other pont x = x k x K k the error s e 2 A = (e k + x) t A(e k + x) = e t k Ae k + ( x) t A( x) + 2e t k A( x) But e t k A( x) = rt k ( x) = 0, snce r k s orthogonal to K k, so x = 0 mnmzes e A Monotonc: e k A e k1 A, and e k = 0 n k m steps Proof. Follows from K k K k+1, and that K k R m unless converged Optmzaton n CG CG can be nterpreted as a mnmzaton algorthm We know t mnmzes e A, but ths cannot be evaluated CG also mnmzes the quadratc functon ϕ(x) = 1 2 xt Ax x t b: e k 2 A = e t k Ae k = (x x k ) t A(x x k ) = x t k Ax k 2x t k Ax + x t Ax = x t k Ax k 2x t k + xt b = 2ϕ(x k ) + constant At each step α k s chosen to mnmze x k = x k1 + α k p k1 The conugated search drectons p k gve mnmzaton over all of K k Optmzaton by Conugate Gradents The Method of Steepest Descent We know that solvng Ax = b s equvalent to mnmzng the quadratc functon ϕ(x) = 1 2 xt Ax x t b The mnmzaton can be done by lne searches, where ϕ(x k ) s mnmzed along a search drecton p k Very smple approach: Set search drecton p k to the negatve gradent r k Corresponds to movng n the drecton ϕ(x) changes the most The α k+1 that mnmzes ϕ(x k + α k+1 p k ) s wth the resdual r k = b Ax k pt k r k α k+1 = p t k Ap k Algorthm: Steepest Descent x 0 = 0, r 0 = b α k = (rk1 t r k1)/(rk1 t Ar k1) x k = x k1 α k r k1 r k = r k1 + α k Ar k1 step length approxmate soluton resdual The resdual s also mnus the gradent of ϕ(x k ): ϕ(x k ) = Ax k b = r k Poor convergence, tends to move along prevous search drectons

The Method of Conugate Drectons The optmzaton can be mproved by better search drectons Let the search drecton be A-conugate, or p t Ap k = 0 Then the algorthm wll converge n at most n steps, snce the ntal error can be decomposed along the p s: n1 e 0 = δ k p k, wth δ k = pt k Ae 0 p t k Ap k k=0 But ths s exactly the α we choose at step k: pt k r k α k+1 = p t k Ap k = pt k Ae k p t k Ap = pt k Ae 0 k p t k Ap k snce the error e k s the ntal e 0 plus a combnaton of p 0,..., p k1, whch are all A-conugate to p k. Each component δ k s then subtracted out at step k, and the method converges after n steps. Choosng A-conugate Search Drectons One method to choose p k whch s A-conugate to prevous search vectors s by Gram-Schmdt: k1 p k = p 0 k β k p, wth β k = p0 t k Ap p t Ap =0 The ntal p 0 k vectors should be lnearly ndependent, for example column k + 1 of dentty matrx Drawback: Must store all prevous search vectors p k Conugate Gradents s smply Conugate Drectons wth a partcular ntal vector n Gram-Schmdt: p 0 k = r k Ths gves orthogonal resduals r t k r = 0 for k, and β k = 0 for k > + 1 Precondtoners for Lnear Systems Precondtoned Conugate Gradents Man dea: Instead of solvng Ax = b solve, usng a nonsngular n n precondtoner M, whch has the same soluton x M 1 Ax = M 1 b Convergence propertes based on M 1 A nstead of A Trade-off between the cost of applyng M 1 and the mprovement of the convergence propertes. Extreme cases: M = A, perfect condtonng of M 1 A = I, but expensve M 1 M = I, do nothng M 1 = I, but no mprovement of M 1 A = A To keep symmetry, solve (C 1 AC )C x = C 1 b wth CC = M Can be wrtten n terms of M 1 only, wthout reference to C: Algorthm: Precondtoned Conugate Gradents Method x 0 = 0, r 0 = b, p 0 = M 1 r 0, z 0 = p 0 α k = (rk1 T z k1)/(p T k1 Ap k1) x k = x k1 + α k p k1 r k = r k1 α k Ap k1 z k = M 1 r k β k = (rk T z k)/(rk1 T z k1) p k = z k + β k p k1 step length approxmate soluton resdual precondtonng mprovement ths step search drecton Commonly Used Precondtoners A precondtoner should approxmately solve the problem Ax = b Jacob precondtonng - M = dag(a), very smple and cheap, mght mprove certan problems but usually nsuffcent Block-Jacob precondtonng - Use block-dagonal nstead of dagonal. Another varant s usng several dagonals (e.g. trdagonal) Classcal teratve methods - Precondton by applyng one step of Jacob, Gauss-Sedel, SOR, or SSOR Incomplete factorzatons - Perform Gaussan elmnaton but gnore fll, results n approxmate factors A LU or A R T R (more later) Coarse-grd approxmatons - For a PDE dscretzed on a grd, a precondtoner can be formed by transferrng the soluton to a coarser grd, solvng a smaller problem, then transferrng back (multgrd)