Synopsis of Numerical Linear Algebra

Synopsis of Numerical Linear Algebra Eric de Sturler Department of Mathematics, Virginia Tech sturler@vt.edu http://www.math.vt.edu/people/sturler Iterative Methods for Linear Systems: Basics to Research Numerical Analysis and Software I

Systems of Linear Equations 1 Iterative Methods Page 1

Systems of Linear Equations 2 Monday, August 22, 2011 11:08 AM Iterative Methods Page 2

Iterative Methods Page 3

Norms A norm on a vector space V is any function f : V such that 1. f ( x ) ³ 0 and f ( x) = 0 x = 0, 2. f ( ax) = a f ( x), 3. f ( x + y) f ( x) + f ( y), where x Î V and a Î. n n m n m n Important vector spaces in this course:,, and, (matrices). Note that the set of all m-by-n matrices (real or complex) is a vector space. Many matrix norms possess the submultiplicative or consistency property: m k k n f AB f A f B for all A Î and B Î (or real matrices). ( ) ( ) ( ) Note that strictly speaking this is a property of a family of norms, because in general each f is defined on a different vector space. 4

Norms We can define a matrix norm using a vector norm (an induced matrix norm): Ax a A = max = max Ax a x x ¹ 0 x 1 a a = a Induced norms are always consistent (satisfy consistency property). Two norms. and. are equivalent if there exist positive, real constants a a b and b such that " x : a x x b x a b a The constants depend on the two norms but not on x. All norms on a finite dimensional vector space are equivalent. 5

Norms Some useful norms on n n,, 1 n p p = ç æ x ö i= 1 i m n, m n : p-norms: x p çèå, especially p = 1, 2,, where x max ø = x. i i Induced matrix p-norms are: A A A 1 n = max å a (max absolute column sum) j ( A) i= 1 ij = s (max singular value harder to compute than others) 2 max = n max å a (max absolute row sum) i j= 1 ij Matrix Frobenius norm: 1 æ n 2ö 2 A = a ç å F ij, = 1 ij çè (similar to vector 2-norm for a matrix) ø All these norms are consistent (satisfy the submultiplicative property) 6

Norms & Inner Products 1 Friday, August 19, 2011 9:25 AM Iterative Methods Page 4

Iterative Methods Page 5 Norms & Inner Products 2 Monday, August 22, 2011 11:26 AM

Norms & Inner Products 3 Monday, August 22, 2011 11:32 AM Iterative Methods Page 6

Iterative Methods Page 7

Norms & Inner Products 4 Monday, August 22, 2011 6:19 PM Iterative Methods Page 8

Iterative Methods Page 9

Norms & Inner Products 5 Monday, August 22, 2011 7:21 PM Iterative Methods Page 10

Iterative Methods Page 11

Norms & Inner Products 6 Monday, August 22, 2011 9:01 PM Iterative Methods Page 12

Iterative Methods Page 13

Error Analysis Monday, August 22, 2011 10:22 PM Iterative Methods Page 14

Inner Products Many methods to select z m from the Krylov space are related to projections. We call f : S S an inner product over the real vector space S, if for all vectors xyz,, and scalars a, 1. f (, xx ) ³ 0and f (, xx) = 0 x= 0 2. f ( axz, ) = afxz (, ) 3. f ( x + y, z) = f( x, z) + f( y, z) 4. f (, xz) = fzx (, ) For a complex inner product, f : S S, over a complex vector space S we have instead of property (4): f (, xz) = fzx (, ). Inner products are often written as, xy, (, ) We say x and y are orthogonal (w.r.t α-ip), x xy, or xy,, etc.. a ^ y if xy, = 0. a a 7

Inner products and Norms Each inner product defines, or induces, a norm: x = áx, xñ. (proof?) Many norms are induced by inner products, but not all. Those norms that are have additional nice properties (that we ll discuss soon). An inner product and its induced norm satisfy: áxy, ñ x y (CS ineq) A norm induced by an inner product satisfies the parallelogram equality: ( ) x + y + x - y = 2 x + y 2 2 2 2 In this case we can find the inner product from the norm as well: 1 2 2 Real case: áxy, ñ= ( x+ y - x-y ) 4 Complex case: 1 2 2 1 Re áxy, ñ= ( x+ y - x-y ) Im áxy, ñ= x+ iy - x-iy 4 4 2 2, ( ) 8

T J- J-Lff Vl. s ouj J.-h.tl, J- J--6. onl-krma. l U- wn d i h D V) ( b - _; u > ==- D J?o, all u G U p ro vi des a.. ree«:.pe. Y- J?i d i vi LA J-ha,J- r-'!. d u.c.e.3 Cl.VJ op.ii h'l i t.cljiovt pro bl.e. W1 ( V\()...y-d / e tply1 s i ve. J to solvivi a. I iyjea.y' s s.j.e.vvi 6.eiua..kons [ re.\ o)-we. I.ea. s / e.he.o..p ). w 11.v f)/ s kow"" th is bd c.ons h-u.c..kch'l; whic.h,wi!/ alsb 3 \,J Oll-0 \th tl. J- U i 3 UYJ t'j, u.e.i2o Y-- ll. i i,v e.\'i 6_

Least Squares 1 Tuesday, August 30, 2011 9:39 PM Iterative Methods Page 1

Least Squares 2 Tuesday, August 30, 2011 9:57 PM Iterative Methods Page 2

Least Squares 3 Tuesday, August 30, 2011 10:16 PM Iterative Methods Page 3

Iterative Methods Page 4

Eigenvalues and Eigenvectors Let Ax = lx and ya = ly (for same l). * * We call the vector x a (right) eigenvector, the vector y a left eigenvector, and l an eigenvalue of A, the triple together is called an eigentriple (of A), and,x l,y a (right) eigenpair and left eigenpair. ( l ) and ( ) The set of all eigenvalues of A, L ( A), is called the spectrum of A (when convenient we will count multiplicities in L ( A) ). If the matrix A is diagonalizable (has a complete set of eigenvectors) we have -1 A= VLV AV = VL, where V is a matrix with the right eigenvectors as columns and L is a diagonal matrix with the eigenvalues as coefficients. This is not always possible (soon). A similar decomposition can be given for the left eigenvectors. 8

Spectral Radius { } The spectral radius r ( A) is defined as r( A) max l : l L( A) = Î. Theorem: For all A and e > 0 a consistent norm. a exists such that A r( A) a + e. So, if r ( A) < 1, then a consistent norm. Take e 1 1 r( A) ( ) = - and apply theorem above. 2 a exists such that A < 1. a Define A = A (complex conjugate transpose). * T If A is Hermitian ( A r A = A. * = A ), then ( ) 2 If A is normal ( AA = A A),then r ( A) = A. 2 * * 11

Characteristic Polynomial The eigenvalues of A are determined by the characteristic polynomial of A. Ax = x w (A I)x = 0 So we're looking for (eigen)values such that the matrix (A I) is singular: det(a I) = 0 (this is a polynomial in ) This polynomial is called the characteristic polynomial of A. The eigenvalues of A are defined to be the roots of its characteristic polynomial. Since eigenvalues of matrix are roots of its characteristic polynomial, the Fundamental Theorem of Algebra implies that an n % n matrix A always has n eigenvalues. The eigenvalues, however, need be neither distinct nor real. Complex eigenvalues of a real matrix must come in complex conjugate pairs.

Multiplicity of eigenvalues Eigenvalues may be single or multiple (single or multiple roots). An eigenvalue with multiplicity k > 1 has k or fewer independent eigenvectors associated with it. It has at least one associated eigenvector. If it has fewer than k independent eigenvectors we call the eigenvalue (and the matrix) defective. The multiplicity of an eigenvalue as the (multiple) root of the char. polynomial is called its algebraic multiplicity. The number of independent eigenvectors associated with an eigenvalue is called its geometric multiplicity. The geometric multiplicity is smaller than or equal to the algebraic multiplicity. A matrix that is not defective is called diagonalizable: we have the decomposition A = X X 1 w X 1 AX = = diag( i ) where X contains the eigenvectors (as columns) and contains the eigenvalues.

Jordan form of a matrix For every matrix A Š n%n there exists a nonsingular matrix X such that X 1 AX = diag(j1,, Jq) J i = Ji Šmi%mi m1 + m2 + + mq = n i 1 i 1 and, and. i Each block has one corresponding eigenvector: q independent eigenvectors Each block has mi 1 principal vectors (of grade 2) If every block is of size 1, the matrix is diagonalizable Multiple blocks can have the same eigenvalue: i = j The sum of the sizes of all blocks with the same eigenvalue is the algebraic multiplicity of the eigenvalue. The number of blocks with the same eigenvalue is the geometric multiplicity of the eigenvalue.

Invariant Subspaces A generalization of eigenvectors to higher dimensions is an invariant subspace. We say a subspace n Ì is invariant under A n n Î if for all x ÎV : Ax Î V It is possible that an invariant subspace is the span of a set of eigenvectors, but this need not be the case. Moreover, in general, elements (vectors) of the subspace will not be themselves eigenvectors. In many cases it is useful to consider the restriction of the matrix A to the invariant subspace : A :. If V n k Î and ( V ) = with range = then AV VL L k k Î. Hence L represents A in the basis defined by V for the space 9

Similarity Transformation and Schur Decomposition Let A have eigenpairs (, v ) Av = l v l : i i i i i For nonsingular B, define the similarity transformation: BAB -1 The matrix 1 BAB - has the same eigenvalues, l i, as A and eigenvectors Bv : i In fact, ( ) -1 BAB Bv = BAv = l Bv i i i i 1 BAB - has the same Jordan-block structure as A. In many cases, we are interested in a (complex) unitary (or real, orthogonal) similarity transformation: * with QAQ * * QQ = Q Q = I Schur decomposition: * QAQ = U (upper triangular) 11

Similarity Transformation 1 Similarity transformation for A: BAB - ; this can be done with any nonsingular B. Let Ax = lx, then -1 BAB Bx BAx lbx = =. 1 BAB - has the same eigenvalues as A, and eigenvectors Bx where x is an eigenvector of A. Although any nonsingular B possible, most stable and accurate algorithms with orthogonal (unitary) matrix. For example used in the QR algorithm. 2

Similarity Transformation Orthogonal similarity transformation for A: * where QQ= I. * QAQ, If * QAQ æl1 Fö = ç 0 L çè 2 ø, then ( A) = ( L ) È ( L ). 1 2 If we can find Q º éq1q ù ê 2 ë ú that yields such a decomposition û we have reduced the problem to two smaller problems. Moreover, AQ1 Q1L1 range Q 1 is invariant subspace. Eigenpair Lz 1 = lz gives eigenpair AQ1z = Q1L1z = lq 1z. = and ( ) 3

Approximation over Search Space For large matrices we cannot use full transformations. Often we do not need all eigenvalues/vectors. Look for proper basis Q 1 that captures relevant eigenpairs. We do not need Q 2. Approximations over subspace range( Q 1 ): L = Q AQ * 1 1 1 When is an approximation good (enough)? We will rarely find AQ1 - Q1L1 = 0 unless we do huge amount of work. Not necessary. We are working with approximations and we must deal with numerical error anyway. 4

Approximation over Search Space Let AQ1 - Q1L1 = R with R small relative to A. Now, ( ) A RQ Q QL AQ R QL - * 1 1-1 1 = 1 - - 1 1 = 0. range( Q 1 ) is exact invariant subspace of perturbed matrix, Â. * Â= A- RQ 1 and Â- A A = R A If R A sufficiently small, then Q 1 is acceptable. In fact, this is as good as we can expect (unless we're lucky). Any numerical operation involves perturbed operands! Note that we cannot always say that Q 1 is accurate. 5

Eigenvalue problems Before we compute eigenvalues and eigenvectors numerically, we must understand what we can and cannot compute (accurately) or should not compute. We may want Ax x for single eigenpair, for example with as small as possible. Say minimum energy level. In many cases we want Axi ix i for i 1 M (M large), where i are smallest M eigenvalues. It may be important that we do not skip any eigenvalues. We may want the invariant subspace accurately. We may want every eigenvector accurately. 2

Usefulness of Computed Results In general we need to consider the accuracy of a computed answer, without knowing the exact answer. This involves the sensitivity of the result we want to compute. If some result is very sensitive to small changes in the problem, it may be impossible to compute exactly. In other cases results may be computable but at very high price, for example, an algorithm may convergence very slowly. Sometimes it is better to compute related but less sensitive result. 3

Sensitivity of an Eigenvalue Sensitivity of eigenvalues to perturbations in the matrix. Different eigenvalues or eigenvectors of a matrix are not equally sensitive to perturbations of the matrix. * * Let Ax x and ya y, where x y 1. Consider A E x e x e and drop second order terms. Ax Ae Ex x e x Ae Ex e x * * * * * * yae yex ye yex ye yx yex * * * y Ex yx yx E yx * * Condition number of simple eigenvalue: * yx 1 4

Sensitivity of an Eigenvalue For symmetric/hermitian matrix, right and left eigenvectors are the same. So, eigenvalues are inherently well-conditioned. More generally, eigenvalues are well conditioned for normal matrices, but eigenvalues of nonnormal matrices need not be well conditioned. Nonnormal matrices may not have a full set of eigenvectors. The algebraic multiplicity, the multiplicity of as a root of det A I 0, is not equal to the geometric multiplicity, dim null A I. In that case we can consider conditioning of the invariant subspace associated with a Jordan block. 5

Sensitivity of an Eigenvalue If is an eigenvalue of A E, then A exists: 1 XEX X E where X is the matrix of eigenvectors of A and 1 X X X is a condition number (consistent norm). A useful backward error result is given by the residual. Let r Ax x and x 1. Then there exists a perturbation E with E A E x x. Proof: Take E * rx. r such that 6

Sensitivity of Eigenvectors 1 0 0 1 Consider A x 1, x 2 0 1 1 0 1. Consider perturbation E with E 1 2, 2 arb. small. Enough to give A any eigenvectors not equal to x 1 and x 2. Let E E1 E2 and X ˆ xˆ1 xˆ 2 (unitary) 0 0 0 0 Let E * 1 0 and E2 Xˆ Xˆ 1 0. 2 A E1 I (all nonzero vectors are eigenvectors) 0 0 1 0 A E I X ˆ ˆ* ˆ ˆ* 0 X X 2 0 1 X 2 7

Model Problems - pu - qu + ru + su + tu = f. Discretize ( ) ( ) x x y y x y D (i,j+1) C (i-1,j) V (i,j) (i+1,j) A (i,j-1) B Integrate equality over box V. Use Gauss divergence theorem to get æpu ö ( ) ( ) x ò pu + qu dxdy = n ds V x x y y ò ç V qu çè y ø And approximate the line integral numerically. 15

Model Problems Now we approximate the boundary integral æpu ö x ò ç nds V qu çè y. ø We approximate the integrals over each side of box V using the midpoint rule and we approximate the derivatives using central differences. C Dy ò pu n dy» p ( U -U ) and so on for the other sides B x 1 i 1/2, j i 1, j i, j D x + + We approximate the integrals over ru, su, tu, and f using the area of the box x y and the value at the midpoint of the box, where we use central differences for derivatives. So, u»( U -U 1, 1, ) /( 2D x x i j i j ), and so on. + - For various examples we will also do this while strong convection relative to the mesh size makes central differences a poor choice (as it gives interesting systems). 16

Model problems This gives the discrete equations Dy - é ê p U - U - p U - U Dx ë Dx - q U - U - p U - U ( ) ( ) i+ 12, j i+ 1, j ij, i-12, j ij, i-1, j é ù ê ij, + 1 2 ( ij, + 1 ij, ) ij, -1 2 ( ij, ij, -1) Dy ë ú û ( Dy /2) r ( U U ) ( Dx /2) s ( U U ) + - + - ij, i+ 1, j i- 1, j ij, ij, + 1 ij, -1 + DD x yt U = DD x yf ij, ij, ij, ù ú û Often we divide this result again by DD. x y 17

Rate of Convergence Let ˆx be the solution of Ax = b, and we have iterates x, x, x, 0 1 2 { x k } converges (q-)linearly to ˆx if there are N ³ 0 and c Î [0,1) such that for k ³ N: x - xˆ c x - xˆ, k+ 1 k { x k } converges (q-)superlinearly to ˆx if there are N ³ 0 and a sequence { c k } that converges to 0 such that for k ³ N: x - xˆ c x - xˆ k+ 1 k k { x k } converges to ˆx with (q-)order at least p if there are p > 1, c ³ 0, and N ³ 0 such that k ³ N: x - x ˆ c x -x ˆ p (quadratic if p = 2, cubic k+ 1 k if p = 3, and so on) { x k } converges to ˆx with j-step (q-)order at least p if there are a fixed integer j ³ 1, p > 1, c ³ 0, and N ³ 0, such that k ³ N: x - xˆ c x - xˆ p + k j k 9