Preface. September 14, 2016

Similar documents
Lecture notes: Applied linear algebra Part 1. Version 2

Foundations of Matrix Analysis

Math 102, Winter Final Exam Review. Chapter 1. Matrices and Gaussian Elimination

Ir O D = D = ( ) Section 2.6 Example 1. (Bottom of page 119) dim(v ) = dim(l(v, W )) = dim(v ) dim(f ) = dim(v )

Notes on Eigenvalues, Singular Values and QR

G1110 & 852G1 Numerical Linear Algebra

Linear Algebra Highlights

Basic Elements of Linear Algebra

Numerical Methods I Eigenvalue Problems

Linear Algebra Massoud Malek

Linear Algebra in Actuarial Science: Slides to the lecture

Elementary linear algebra

1. General Vector Spaces

Numerical Methods - Numerical Linear Algebra

Preliminary/Qualifying Exam in Numerical Analysis (Math 502a) Spring 2012

The Eigenvalue Problem: Perturbation Theory

Applied Mathematics 205. Unit V: Eigenvalue Problems. Lecturer: Dr. David Knezevic

MATRICES ARE SIMILAR TO TRIANGULAR MATRICES

Eigenvalue and Eigenvector Problems

MATH 240 Spring, Chapter 1: Linear Equations and Matrices

Linear Algebra 1. M.T.Nair Department of Mathematics, IIT Madras. and in that case x is called an eigenvector of T corresponding to the eigenvalue λ.

MATHEMATICS 217 NOTES

Math 18, Linear Algebra, Lecture C00, Spring 2017 Review and Practice Problems for Final Exam

MATH 583A REVIEW SESSION #1

EE/ACM Applications of Convex Optimization in Signal Processing and Communications Lecture 2

Lecture 10 - Eigenvalues problem

Throughout these notes we assume V, W are finite dimensional inner product spaces over C.

Quantum Computing Lecture 2. Review of Linear Algebra

SUMMARY OF MATH 1600

Linear Algebra Lecture Notes-II

BASIC ALGORITHMS IN LINEAR ALGEBRA. Matrices and Applications of Gaussian Elimination. A 2 x. A T m x. A 1 x A T 1. A m x

Review problems for MA 54, Fall 2004.

Eigenvalues, Eigenvectors. Eigenvalues and eigenvector will be fundamentally related to the nature of the solutions of state space systems.

MTH 464: Computational Linear Algebra

Mathematical Methods wk 2: Linear Operators

Lecture Summaries for Linear Algebra M51A

j=1 x j p, if 1 p <, x i ξ : x i < ξ} 0 as p.

MATH 423 Linear Algebra II Lecture 33: Diagonalization of normal operators.

EE731 Lecture Notes: Matrix Computations for Signal Processing

MAT Linear Algebra Collection of sample exams

1 Last time: least-squares problems

DS-GA 1002 Lecture notes 0 Fall Linear Algebra. These notes provide a review of basic concepts in linear algebra.

LinGloss. A glossary of linear algebra

MAT 610: Numerical Linear Algebra. James V. Lambers

Matrices and Vectors. Definition of Matrix. An MxN matrix A is a two-dimensional array of numbers A =

Numerical Linear Algebra

Chapter 3. Matrices. 3.1 Matrices

Linear Algebra. Matrices Operations. Consider, for example, a system of equations such as x + 2y z + 4w = 0, 3x 4y + 2z 6w = 0, x 3y 2z + w = 0.

(v, w) = arccos( < v, w >

Linear Algebra- Final Exam Review

Introduction to Matrix Algebra

The following definition is fundamental.

Lecture 1: Review of linear algebra

1 9/5 Matrices, vectors, and their applications

6 Inner Product Spaces

Math 350 Fall 2011 Notes about inner product spaces. In this notes we state and prove some important properties of inner product spaces.

Math Linear Algebra II. 1. Inner Products and Norms

Review of some mathematical tools

Math Camp Lecture 4: Linear Algebra. Xiao Yu Wang. Aug 2010 MIT. Xiao Yu Wang (MIT) Math Camp /10 1 / 88

NONCOMMUTATIVE POLYNOMIAL EQUATIONS. Edward S. Letzter. Introduction

Linear Algebra Review

Duke University, Department of Electrical and Computer Engineering Optimization for Scientists and Engineers c Alex Bronstein, 2014

Diagonalizing Matrices

Linear algebra and applications to graphs Part 1

Math Linear Algebra Final Exam Review Sheet

EIGENVALUE PROBLEMS. EIGENVALUE PROBLEMS p. 1/4

a 11 a 12 a 11 a 12 a 13 a 21 a 22 a 23 . a 31 a 32 a 33 a 12 a 21 a 23 a 31 a = = = = 12

AMS526: Numerical Analysis I (Numerical Linear Algebra)

Review of Some Concepts from Linear Algebra: Part 2

Linear Algebra. Session 12

Lecture 2: Linear Algebra Review

Lecture 2: Linear operators

LINEAR ALGEBRA BOOT CAMP WEEK 4: THE SPECTRAL THEOREM

Chap 3. Linear Algebra

E2 212: Matrix Theory (Fall 2010) Solutions to Test - 1

Real symmetric matrices/1. 1 Eigenvalues and eigenvectors

Linear algebra for computational statistics

Notes on Linear Algebra and Matrix Theory

Math 4A Notes. Written by Victoria Kala Last updated June 11, 2017

Linear Algebra. Workbook

IMPORTANT DEFINITIONS AND THEOREMS REFERENCE SHEET

NOTES ON BILINEAR FORMS

A Review of Linear Algebra

Linear Algebra March 16, 2019

Numerical Linear Algebra

Spectral Theorem for Self-adjoint Linear Operators

OHSx XM511 Linear Algebra: Solutions to Online True/False Exercises

Chapter 6: Orthogonality

The University of Texas at Austin Department of Electrical and Computer Engineering. EE381V: Large Scale Learning Spring 2013.

Review of Basic Concepts in Linear Algebra

A Review of Linear Algebra

Matrices and Linear Algebra

Math 108b: Notes on the Spectral Theorem

EIGENVALUE PROBLEMS. Background on eigenvalues/ eigenvectors / decompositions. Perturbation analysis, condition numbers..

Mathematical Optimisation, Chpt 2: Linear Equations and inequalities

ELEMENTARY LINEAR ALGEBRA WITH APPLICATIONS. 1. Linear Equations and Matrices

IMPORTANT DEFINITIONS AND THEOREMS REFERENCE SHEET

Math 520 Exam 2 Topic Outline Sections 1 3 (Xiao/Dumas/Liaw) Spring 2008

I. Multiple Choice Questions (Answer any eight)

CS 246 Review of Linear Algebra 01/17/19

Transcription:

September 14, 2016 Preface Exercises in the collectiont of all exercises are included for one of the following reasons: to allow to practice course material, to give applications of the course material, to provide more details on results discussed in the lectures, to give details on side remarks in the lectures, to provide details on results that are useful for the final assignment. For each lecture you are supposed to do a few selected exercises from this collection. Most chapters (titled Lecture n -... in this collection) start with a brief introduction. This text is included in order to give a context for the exercises, i.e., to settle notation, to provide a brief review of the required theory and a motivation for the exercises. The text is not meant as an easy introduction to the theory; for this, please consult the lectures, the transparencies or text books. Nevertheless, if you understand the theory, the text may provide a convenient summary. The exercises in Lecture 0. The set of exercises in this preliminary chapter forms an overview of the Linear Algebra material from bachelor courses, material that will be used during this Numerical Linear Algebra course (in the lectures, exercises, assignments or final report) and that is supposed to be known. This set also fixes the notation. A few items, as Schur decomposition, might not have been discussed (or only briefly) in a standard bachelor course. They are included since this overview seems to be the appropriate place for them. But, they will be introduced properly in the lectures, when needed. 0

Lecture 0 Preliminaries Scalars in C and R are denoted by lower Greek letters, as λ. High dimensional vectors and matrices are denoted by bold face letters, lower case letters are used for vectors and capitals for matrices. If, for instance, n is large (high), then x,y,... are vectors in C n (or R n ) and A,V,... are n k matrices. Low dimensional vectors and matrices are denoted by standard letters: x, y,... or x, y,... are k-vectors for small (low) k, A, S,... are k l matrices, with l small as well. In many of our applications, n N will be large, and k N will be modest. 1 Spaces are denoted with calligraphic capitals, as V. We view an n-vector as a column vector, that is, as an n 1 matrix. Our notation is column vector oriented, that is, we denote row vectors (1 n matrices) as x, with x a column vector. Let A = (A ij ) be an n k matrix: A = (A ij ) indicates that A ij is the (i, j)-entry of A. With A = [a 1,a 2,...,a k ] or A = [a 1 a 2... a k ] we settle the notation for the columns of A: the jth column equals a j. The absolute value and the complex conjugate are entry-wise operations: A ( A ij ) and Ā (Āij). The transpose A T of the matrix A is the k n matrix with (i, j)-entry A ji : A T (A ji ). A H is the adjoint or Hermitian conjugate of A: A H ĀT. We will also use the notation A for A H : A = A H. 2 We follow Matlab s notation to describe matrices that are formed from other matrices: consider an n k matrix A = (A ij ) and an m l matrix B = (B ij ). If m = n, then [A,B] is the n (k + l) matrix with (i, j) entry equal to A i,j if j k and B i,j k if j > k: A is extended with the columns from B. If k = l, then [A;B] is the (n + m) k matrix with (i, j) entry equal to A i,j if i n and B i n,j if i > n: A is extended with the rows from B. Note that [A;B] = [A T B T ] T. If I = (i 1, i 2,...,i p ) is a sequence of numbers i r {1, 2,..., n} and J = (j 1, j 2,..., j q ) is a sequence of numbers j s in {1, 2,..., k}, then A(I, J) is the p q matrix with (r, s) entry equal to A ir,j s. Note that entries of A can be used more than once. Below, we collect a number of standard results in Linear Algebra that will be frequently used. The statements are left to the reader as an exercise. A Spaces Let V and W be linear subspace of C n. Then V + W is the subspace V + W {x + y x V,y W}. We put V W for the subspace V + W if V W = {0}. Exercise 0.1. (a) V + W is a linear subspace. 1 We distinguish high and low dimensionality to indicate differences in efficiency. A dimension k is low, if the solution of k-dimensional problems of a type that we want to solve numerically can be computed in a split second with a computer and standard software. The dimension is high if more computational time is required or non-standard software has to be used. For linear systems, that is, solve Ax = b for x, where A is a given k k matrix and b is a given k-vector, k small is like k 1000. For eigenvalue problems, that is, find a non-trivial vector x and a scalar λ such that Ax = λx, where A a given k k matrix, k small is like k 100. From a pure mathematical point of view low and high dimensionality does not have a meaning (in pure mathematics, low would mean finite, while high would be infinitely dimensional. The problems that we will solve are all finite dimensional). In a mathematical statement the difference between low and high dimensionality does not play a role. But in its interpretation for practical use, it does. 2 Formally, A is defined with respect to inner products: if (, ) X and (, ) Y are inner product on a linear space X and on a linear space Y, respectively, and A linearly maps X to Y, then A is the linear map from Y to X for which (Ax,y) Y = (x,a y) X for all x X and y Y. With respect to the standard inner product (x, y) y H x on X C k and on (x,y) y H x on Y C n, we have that A = A H. With A, we will (implicitly) refer to standard inner product, unless explicitly stated otherwise. 1

(b) Suppose V W = {0}. Then dim(v) + dim(w) = dim(v W) (c) Suppose V W = {0}. Then V W = C n if and only if dim(v) + dim(w) = n. (d) If dim(v) + dim(w) > n, then V W {0}. If x and y are n-vectors (i.e., in C n ), then we put x 2 x x and y x if y x = 0. Exercise 0.2. (a) The map (x,y) y x from C n C n to C defines an inner product on C n : 1) x x 0 and x x = 0 if and only if x = 0 (x C n ), 2) x y x is a linear map from C n to C for all y C n, 3) (y x) = x y (x,y C n ). (b) The map x x 2 from C n to C defines an norm on C n : 1) x 2 0 and x 2 = 0 if and only if x = 0 (x C n ), 2) αx 2 = α x 2 (α C,x C n ), 3) x + y 2 x 2 + y 2 (x,y C n ). (c) y x x 2 y 2 (d) If x y then x + y 2 2 = x 2 2 + y 2 2 (x,y C n ) (Cauchy Schwartz). (x,y C n ) (Pythagoras). We put v W if v w (w W), V W if v W (v V), and V {y C n y V}. Let V = [v 1,...,v k ] be a n k matrix with columns v 1,...,v k. Then k span(v) span(v 1,...,v k ) α j v j α j C. We put x V if x span(v). Moreover, V {y C n y V}. Exercise 0.3. (a) dim(v) = n dim(v ). (b) x V x v i for all i = 1,...,k V x = 0. (c) dim(span(v)) k. B The angle (x,y) between two non-trivial n-vectors x and y is in [0, 1 2π] such that Matrices. j=1 cos (x,y) = y x y 2 x 2. Let A = (a ij ) be an n k matrix. We will view the matrix A as map from C k to C n defined by the matrix-vector multiplication: x Ax (x C k ). The column (row) rank of A is the maximum number of linearly independent columns (rows) of the matrix A. Theorem 0.1 The row rank of a matrix is equal to the column rank. The above theorem allows us to talk about the rank of a matrix. The range R(A) of A is {Ay y C k }. The null space N(A) or kernel of A is {x C k Ax = 0}. Exercise 0.4. (a) R(A) = span(a). 2

(b) the rank of A equals dim(r(a)). (c) N(A) = R(A ). (d) dim(r(a)) = n dim(n(a)). Exercise 0.5. (a) A : C k C n is a linear map for some n k matrix A we have that A(x) = Ax for all x C k. (b) Let v 1,..., v k be a basis of C k and w 1,...,w n a basis of C n. Let V [v 1,...,v k ] and W [w 1,...,w n ]. Then V and W are non-singular and W 1 AV is the matrix of the map x Ax from C k to C n with respect to the V and W basis. Exercise 0.6. Let A = [a 1,...,a k ] be an n k matrix and B = [b 1,...,b k ] and m k matrix. Let D diag(λ 1,...,λ k ) be an k k diagonal matrix with diagonal entries λ j. (a) A = A. (b) (BA ) = AB. (c) AB = k j=1 a jb j. (d) a j b j are n m rank one matrices. (e) ADB = k j=1 λ ja j b j. Exercise 0.7. Let the n n matrix U = (u ij ) be upper triangular, i.e., u ij = 0 if i > j. (a) U 1 is upper triangular and U is lower triangular. (b) If in addition the diagonal of U is the identity matrix, then the diagonal of U 1 is the identity matrix as well. (c) The product of upper triangular matrices is upper triangular as well. If A is an n n matrix, then the determinant det(a) is the volume of the block {Ax x = (x 1,..., x n ) T, x i [0, 1]}. The trace trace(a) of A is the sum of its diagonal entries. Theorem 0.2 If A is n k and B is k n, then trace(ab) = trace(ba). If n = k, then det(ab) = det(a)det(b). Exercise 0.8. Let A be an n n matrix. (a) Prove that the following properties are equivalent: det(a) 0. A had full rank. A has a trivial null space: N(A) = {0}. The range of A is C n : R(A) = C n. A : C n C n is invertible. There is an n n matrix, denoted by A 1, for which A 1 A = I. A is non-singular if A has one of these properties. A 1 is the inverse of A. (b) AA 1 = I. If B is n n and BA = I or AB = I, then B = A 1. (c) With Cramer s rule, the inverse of a matrix can be expressed in terms of determinants of submatrices. However, this approach for finding inverses is extremely inefficient and, except for very low dimensions, it is never used in practice. Cramer s rule for n = 2: [ α β γ δ ] 1 = [ 1 αδ βγ δ γ ] β. α Exercise 0.9. 3

Gram Schmidt orthonormalisation r 11 = a 1 2, q 1 = a 1 /r 11, l = 1. for j = 2,...,k Orthogonalise: v = a j for i = 1,...,l r ij = q i a j, v v q i r ij end for Normalise: r l+1,j = v 2 If r l+1,j 0 l l + 1, q l = v/r lj end if end for Algorithm 0.1. The Gram Schmidt process constructs an orthonormal basis q 1,...,q l for the space spanned by a 1,...,a k. Here indicates that the new quantity replaces the old one. If a j is in the span of a 1,...,a j 1, then, a j is in the span of q 1,...,q l 1, r l j = 0 and no new orthonormal vector q l is formed. If the vectors a 1,...,a k are linearly independent then l at the end of each loop equals j. (a) Let A, L and U be n n matrices such that A = LU, L lower triangular with diagonal I and U upper triangular. Let µ j be the (j, j)-entry of U. det(a) = det(u) = µ 1... µ n. Exercise 0.10. Let A be an n n non-singular matrix. (a) Prove that (A T ) 1 = (A 1 ) T and (A H ) 1 = (A 1 ) H. We will put A T instead of (A T ) 1 and A H instead of (A H ) 1. C Orthonormal matrices. V = [v 1,...,v k ] is orthogonal if v i v j for all i, j = 1,...,k, i j. If V is orthogonal and, in addition, v j 2 = 1 (j = 1,...,k), then V is orthonormal. In some textbooks, V is called orthogonal if multiplication by V preserves orthogonality (see Exercise 1.10.C). Exercise 0.11. Let V be an n k matrix. (a) If V is orthonormal, then k = dim(span(v)). (b) V is orthonormal V V = I k the k k identity matrix Let a 1,...,a k be non-trivial n-vectors. The Gram Schmidt process in Alg. 0.1 (see also Exercise 0.12(a)) constructs orthonormal n-vectors q 1,...,q l that span the same space as a 1,...,a k. The q j form the columns of an n l orthonormal matrix Q. Note that l k and l n, while l < k only if the vectors a 1,...,a k are linearly dependent. Let R be the l k matrix with ij entry r ij as computed in the algorithm and 0 if not computed. Then A = QR. The following theorem highlights this result. Theorem 0.3 Let A = [a 1,...,a k ] be an n k matrix. Let Q and R be as produced by the Gram Schmidt process applied to the columns of A. Then Q is orthonormal, span(a) = span(q), R is upper triangular, and A = QR. 4

A matrix Q is unitary of Q is square and orthonormal. Exercise 0.12. Proof of Theorem 0.3. Let A = [a 1,...,a k ] be an n k matrix. (a) Suppose q 1,...,q l is an orthonormal system, l < k. For a j C n, consider and, if v 2 0, Then, q l+1 span(q 1,...,q l ), and r ij = q i a j (i = 1,..., l), v = a j l q i r ij, (0.1) j=1 r l+1,j = v 2, q l+1 v r l+1,j. (0.2) l+1 a j = q i r ij = Q l+1 r j, i=1 where Q l+1 = [q 1,...,q l+1 ] and r j C l+1 has ith entry r ij as described above in (0.1) and (0.2). In particular, Q l+1 is orthonormal and a j span(q l+1 ). In (0.1), the vector a j is orthogonalised against q 1,...,q l, while in (0.2) the vector v is normalised. (b) Show that (0.1) can be expressed as v = a j Q l (Q l a j), (0.3) (c) If v 2 = 0, then a = Q l r j, where r j is the l upper part of r j. (d) Prove Theorem 0.3: there is an n l orthonormal matrix Q, with l min(k, n), and an l k upper triangular matrix R such that A = QR. (0.4) (e) There is an n n unitary matrix Q and an n k upper triangular matrix R such that A = Q R (0.5) (f) Relate Q and Q and R and R. The relation in (0.5) is the QR-decomposition or QR-factorisation of A. The relation in (0.4) is the economical form of the QR-decomposition. Theorem 0.4 Let V be a k-dimensional linear subspace of C n. Let b C n. For a b 0 V, the following two properties are equivalent: (i) b b 0 2 b v 2 for all v V. (ii) b b 0 V. There is exactly one b 0 V with one of these equivalent properties. Exercise 0.13. Let V be a k-dimensional linear subspace of C n. Let b C n. (a) There is an n k orthonormal matrix V such that V = span (V). (b) We have that b 0 V(V b) V and b b 0 V. (c) If x = y + z for some y V and z V, then y = x 0 V(V x). (d) C n = V V. (e) Prove Theorem 0.4. 5

Exercise 0.14. (a) R(A) = {Ax x N(A)}. Let A be an n k matrix. (b) For an x C k, let x 1 C k be such that x 1 N(A) and x x 1 N(A). There is precisely one k n matrix, denoted by A, for which A y = 0 if y R(A) and A (Ax) = x 1 (x C k ). A is the inverse of A as a map from N(A) to R(A) with null-space equal to R(A). A is the Moore Penrose pseudo inverse or generalised inverse of A. (c) The following four properties do not involve the notion of orthogonality. They characterise the Moore Penrose pseudo inverse. AA A = A, A AA = A, (AA ) = AA, (A A) = A A. D Eigenvalues. Let A be an n n matrix. Let λ C. If x C n, then (λ,x) is an eigenpair of the matrix A if Ax = λx and x 0, λ is an eigenvalue and x is an eigenvector associated to the eigenvalue λ. V(λ) {x C n Ax = λx} is the eigenspace associated to λ. The dimension of V(λ) is the geometric multiplicity of the eigenvalue λ. The characteristic polynomial P A is defined by P A (ζ) det(ζi A) (ζ C). Exercise 0.15. (a) λ C is an eigenvalue of A if and only if λ is a root of P A, i.e., P A (λ) = 0. (b) If P A has k mutually different complex roots, then A has at least k eigenvalues. (c) If A = (a ij ) is real (i.e., a ij R for all i, j), and (λ,x) is an eigenpair of A, then ( λ, x) is an eigenpair of A. The algebraic multiplicity of the eigenvalue λ is the multiplicity of the root λ of P A. λ is a simple eigenvalue of A if its algebraic multiplicity is one. An eigenvalue λ of A is semi-simple if the algebraic multiplicity equals the geometric multiplicity. The matrix A is semi-simple if all of its eigenvalues are semi-simple. If all eigenvalues are simple, then A is said to be simple. Exercise 0.16. (a) Any simple eigenvalue is semi-simple. (b) Counted according to algebraic multiplicity, A has n eigenvalues. (c) Give an example of a 2 2 matrix with an eigenvalue with algebraic multiplicity 2 and geometric multiplicity 1. (d) For any n n matrix B, the two matrices AB and BA have the same eigenvalues with equal multiplicity (algebraic, as well as geometric). The same statement also holds for the non-zero eigenvalues in case A is n k and B is k n. (e) Eigenvalues do not depend on the basis, i.e., if T is a non-singular n n matrix, then A and T 1 AT have the same eigenvalues with equal multiplicity (algebraic, as well as geometric). (f) Any non-trivial linear subspace V of C n that is invariant under multiplication by A (i.e., Ax V for all x V) contains at least one eigenvector of A. (g) V(λ) W(λ) {w C n (A λi) k w = 0 for some k N} (h) Both V(λ) and W(λ) are linear subspaces of C n invariant under multiplication by A. 6

(i) The dimension of W(λ) equals the algebraic multiplicity of the eigenvalue λ. (j) To simplify notation, assume 0 is an eigenvalue of A (otherwise, replace A by A λi). Let x be a non-trivial vector in W(0). Let k N be the smallest number for which A k x = 0. Assume α m A m x +... + α 1 Ax + α 0 x = 0 for some α j C. Prove that α 0 =... = α k 1 = 0. Prove that x W(µ) µ = 0. In particular, W(λ) W(µ) = {0} if λ µ. (k) C n = W(λ), where we sum over all different eigenvalues λ of A. If Q is n k orthonormal with k n and S is k k upper triangular such that AQ = QS, (0.6) then (0.6) is a partial Schur decomposition (or partial Schur form) of A (of order k). If k = n, then (0.6) is a Schur decomposition of Schur form. Theorem 0.5 A has a Schur decomposition. Proof. Apply induction to k to prove the theorem: There is a normalised eigenvector q 1 of A. Note that Aq 1 = q 1 λ 1 is a partial Schur decomposition of order 1. Suppose we have a partial Schur decomposition AQ k = Q k S k of order k. Note that Q k is a linear subspace of Cn that is invariant under multiplication by the deflated matrix à (I Q k Q k )A(I Q kq k ). Therefore (see (f) of Exercise 0.16), à has a normalised eigenvector in Q k, say q k+1 with eigenvalue, say λ k+1. Expanding Q k to Q k+1 and S k to S k+1, Q k+1 [Q k,q k+1 ] and S k+1 [ S k Q k Aq k+1 0 λ k+1 leads to the partial Schur decomposition AQ k+1 = Q k+1 S k+1 of order k + 1. ], Exercise 0.17. Suppose we have a partial Schur decomposition (0.6). (a) The diagonal entries of S are eigenvalues of S and of A. (b) If Sy = λy, then (λ,qy) is an eigenpair of A (c) The computation of y with Sy = λy requires the solution of an upper triangular system. Without proof, we mention: Theorem 0.6 There is a non-singular n n matrix T such that AT = TJ, where J is a matrix on Jordan normal form, i.e., J is a block diagonal matrix with Jordan blocks on the λ 1 λ... diagonal. A Jordan block is a square matrix of the form J λ =.... 1 λ A is diagonalizable if J is diagonal (i.e., all Jordan blocks in J are 1 1). Theorem 0.7 The following properties are equivalent voor any n n matrix A: 1) A is semi-simple, 2) A is diagonalizable, 3) there is a basis of eigenvector of A, i.e., there is a basis v 1,...,v n of C n such that v i is an eigenvector of A for all i. 7

Exercise 0.18. Proof of Theorem 0.7. (a) If an eigenvalue λ of A shows up in exactly p Jordan blocks in the Jordan normal form, then p is the geometric multiplicity of λ. (b) Suppose J is on Jordan normal form. Describe V(λ) and W(λ) in terms of the standard basis vectors e i. (c) A is semi-simple A is diagonalizable. (d) Prove Theorem 0.7. Theorem 0.8 (Cayley-Hamilton) Let P A (ζ) = ζ n + α n 1 ζ n 1 +... + α 0 (ζ C) be the characteristic polynomial of A. Then P A (A) A n + α n 1 A n 1 +... + α 0 I = 0. (0.7) The minimal polynomial Q A of A is the monic non-trivial polynomial Q of minimal degree for which Q(A) = 0. Q is monic if Q(ζ) = ζ k +terms of degree < k. The minimal polynomial factorises P A, i.e., P A = Q A R for some polynomial R (R might be constant 1). Exercise 0.19. Proof of Theorem 0.8. Let λ 1,...,λ n be the eigenvalues of A counted according to algebraic multiplicity. (a) If T is a non-singular n n matrix and P is a polynomial, then P(T 1 AT) = T 1 P(A)T. (b) Let p be a polynomial. Show that p(λ) p (λ) p (λ) λ 1 0 p(j) = 0 p(λ) p (λ) if J = 0 λ 1. (0.8) 0 0 p(λ) 0 0 λ Generalise this result to Jordan blocks of higher dimension. (c) If J λ is a Jordan block of size l l, then P(J λ ) = 0 for any polynomial P of the form P(ζ) = (λ ζ) l Q(ζ) (ζ C), with Q a polynomial. (d) Use Theorem 0.6 to prove (0.7). (e) Show that the minimal polynomial factorises the characteristic polynomial. (f) Show that the degree of the minimal polynomial is at least equal to the number of different eigenvalues of A, with equality if and only if A is semi-simple. The degree of the minimal polynomial is also called the degree of A. Exercise 0.20. Consider the situation of Theorem 0.8. (a) Prove that n α 0 = det(a) = λ j, α n 1 = trace(a) = j=1 n λ j. (b) Suppose A is non-singular. Note that then α 0 0. Consider the linear system Ax = b. Show that x = q(a)b for some polynomial q of degree < n. Actually, one can take q(ζ) = 1 α 0 (ζ n 1 + α n 1 ζ n 2 +... + α 1 ). Give also an expression for q in terms of the minimal polynomial. Exercise 0.21. Let B be an n n matrix that commutes with A, i.e., BA = AB. (a) Both space V(λ) and W(λ) (w.r.t. A) are invariant under multiplication by B. (b) The space V(λ) contains an eigenvector of B. If y C n,y 0 and y A = µy, then y is a left eigenvector of A associated to the (left) eigenvalue µ. If we discuss left eigenvectors, then we refer to non-trivial vectors x for which Ax = λx as right eigenvectors. Left and right eigenvectors with different eigenvalues are mutual orthogonal (for a proof, see Exercise 0.22): j=1 8

Theorem 0.9 Let A be an n n matrix. 1) λ C is a left eigenvalue of A if and only if λ is a right eigenvalue of A. 2) If x is a right eigenvector with eigenvalue λ and y be a left eigenvector with eigenvalue µ λ, then y x. Corollary 0.10 Let A be an n n matrix. Suppose u is in the span of right eigenvectors x i of A with eigenvalue λ i : u = α i x i. If λ i is simple and y i is the left eigenvector of A associated with λ i scaled such that y x = 1, then α i = y i u. Exercise 0.22. Let y be a left eigenvector with eigenvalue µ. (a) For λ C, λ left eigenvalue P A (λ) = 0 λ is a right eigenvalue. (b) If x is a right eigenvector with eigenvalue λ and λ µ, then y x. (c) If x is a right eigenvector with eigenvalue µ and there is an n-vector z such that Az = µz+x (x is associated with a non-trivial Jordan block J µ ), then y x. (d) The subspace y is invariant under multiplication by A. (e) If µ is simple, then y = W(λ), where we sum over all eigenvalues λ of A, λ µ. (f) {y (A µi) l y = 0 for some l N} W(λ) if λ µ. (g) Give an example of a matrix A with left and right eigenvector y and x, respectively, both associated to the same eigenvalue λ such that y x. (Hint: you can find a 2 2 matrix A with λ = 0 with this property.) The spectrum Λ(A) of A is the set of all eigenvalues of A. The spectral radius ρ(a) of A is the absolute largest eigenvalue of A: ρ(a) = { λ λ Λ(A)}. For complex numbers x with x < 1 we have that x k 0 (k ) and (geometric series) (1 x) 1 = 1 + x + x 2 + x 3 +.... For matrices A, ρ(a) < 1 implies A k 0 (k ) and (Neumann series) (I A) 1 = I + A + A 2 + A 3 +.... (0.9) Theorem 0.11 1) A k x 0 (k 0) for all x C n ρ(a) < 1. 2) If 1 Λ(A), then I A is non-singular. 3) If ρ(a) < 1, then I + A +... + A k converges to (I A) 1. Exercise 0.23. Proof of Theorem 0.11. (a) Prove the first statement of the theorem in case A is a Jordan block J λ. (Hint: Jλ k is upper triangular with entries λ n, nλ n 1, n(n 1)λ n 2,..., on the main diagonal, first co-diagonal, second co-diagonal,..., respectively, see (0.8)) (b) Prove the first statement of the theorem for the general case. (c) Prove the third statement. (Hint: check that (I A)(I + A +... + A k ) = I A k+1.) An eigenvalue λ of A is dominant if it is simple and λ > λ j for all other eigenvalues λ j of A. An eigenvector associated to a dominant eigenvalue is said to be dominant. 9

Theorem 0.12 (Perron Frobenius) Let A be such that A = A. Then ρ(a) Λ(A). If, in addition, A is irreducible and a-periodic, 3 then ρ(a) is a dominant eigenvalue of A. A characteristic polynomial is monic: the leading coefficient is one. Conversely, any monic polynomial is a characteristic polynomial of some suitable matrix. This statement is obvious if the zeros of the polynomial are available: then, we can take the diagonal matrix with the zeros on the diagonal. However, for a suitable matrix, we do not need the zeros. Let p(ζ) = ζ n (α n 1 ζ n 1 +...+α 1 ζ +α 0 ) (ζ C) be a polynomial (with α j C). Then H λ n 1 λ n 2.. 1 = λ λ n 1 λ n 2.. 1 α n 1 α n 2... α 1 α 0 1 0... 0 0., where H 0 1............. 1 0, (0.10) for all zeros λ of p. In particular, the zeros of p are eigenvalues of H and p is the characteristic polynomial of H. H is the companion matrix of p. Modern software packages as Matlab compute zeros of polynomials, by forming the companion matrix and applying modern numerical techniques for computing eigenvalues of matrices. Exercise 0.24. Let p a polynomial with companion matrix H (cf., (0.10)). Let x(ζ) be the vector with coordinates ζ n 1, ζ n 2,..., ζ,1 (ζ C). (a) Prove that Hx(λ) = λx(λ) p(λ) = 0. (b) Prove that p is the characteristic polynomial of H in case all zeros of p are mutually different. (c) Suppose p(λ) = p (λ) = 0. Show that Hx (λ) = λx (λ) + x and conclude that λ is an eigenvalue of H of algebraic multiplicity at least 2. and that the associated Jordan block J λ is at least 2 2. (d) Prove that p is the characteristic polynomial of H regardless the multiplicity of the zeros. E Special matrices. A is an n n matrix. A is Hermitian (or self adjoined) if A = A. A is symmetric if A T = A. Note hat for a real matrix A (i.e., all matrix entries are in R), A is symmetric if and only of A is Hermitian. Often, if a matrix is said to be symmetric, it is implicitly assumed that the matrix is real. If that is not case, the matrix is referred to as a complex symmetric matrix, i.e., the possibility that matrix entries are non-real is explicitly mentioned. A matrix A is anti-hermitian if A = A. Sometimes it is convenient to split a (general square) matrix A into a Hermitian and an anti-hermitian part: A = A h + A a, with A a 1 2 (A + A ) and A a 1 2 (A A ) (0.11) (see Exercise 0.25), as a complex number α can be split onto a real and an imaginary part: α = α r + iα i with α r = Re(α) and α i = Im(α). Here i is the complex number 1. Exercise 0.25. (a) If A and H are Hermitian and α, β R, then αa + βh is Hermitian. (b) If V is an n k matrix and A is Hermitian, then V AV is Hermitian. 3 The directed graph associated to the matrix A consists of vertices 1,..., n and there is an edge from i to j iff A ij 0. A matrix is irreducible if for all i, j there is a path in its graph from vertex i to vertex j. The matrix is a-periodic if the greatest common divisor of the length of circular paths is 1. 10

(c) If A is anti-hermitian, then ia is Hermitian. Here i = 1. (d) Any square matrix A can be written as in (0.11) with A h Hermitian and A a anti-hermitian. (e) A is Hermitian x Ay R for all x,y C n. (f) If x Ax R for all x C n x A a x = 0 for all x C n. (g) If x Ax R for all x C n A = A h is Hermitian. (h) If A = QSQ is the Schur decomposition of an Hermitian matrix A, then S is a real diagonal. In particular, an Hermitian matrix A is diagonalizable, all eigenvalues are real and A has an orthonormal basis of eigenvectors, i.e., there is an orthonormal basis of C n such that all basis vectors are eigenvectors of V. A is a normal matrix if AA = A A. Theorem 0.13 Hermitian and anti-hermitean matrices are normal. If A is normal, then a vector is a right eigenvector of A if and only if it is a left eigenvector. The following properties are equivalent voor a square matrix A: 1) A is normal. 2) A a A h = A h A a. 3) There is an orthonormal basis of eigenvector of A. 4) A = p(a) for any polynomial p for which p(λ) = λ for all eigenvalues λ of A. 5) There is a polynomial p for which A = p(a). Exercise 0.26. Proof of Theorem 0.13. (a) Prove the first claim of the theorem. (b) Subsequentially prove the following impications (see the theorem) 1) 2), 2) 3) (Hint: use (b) of Exercise 0.21), 3) 4), 4) 5) (Hint: use Lagrange interpolation), 5) 1). (c) Prove that left and right eigenvectors coincide in case A is normal. Does the converse hold? Assume in the remaining of this exercise that A is normal (d) Prove that there is a polynomial p as in 5) with degree #Λ(A), i.e., the number of different eigenvalues of A. In particular, the degree of the polynomial p is the degree of the minimal polynomial of A. (e) If A = p(a) then p p(a) = A, in particular the minimal polynomial of A is a polynomial factor of the polynomial λ p(p(λ)). A is (semi-) positive definite if x Ax > 0 (x Ax 0, respectively) for all x C n,x 0. Exercise 0.27. (a) A is positive definite A is Hermitian and λ > 0 for all eigenvalues λ of A. (Here, you can use that A = 0 if x Ax = 0 for all x C n. For a proof, see Exercise 1.9(a).) (b) A is semi positive definite A is Hermitian and λ 0 for all eigenvalues λ of A. (c) A is positive definite A = MM for some non-singular n n matrix M. (d) A is positive definite A = LL for some non-singular n n lower triangular matrix L. (Hint: apply (0.5) to M ). (e) A is semi positive definite A = MM for some n n matrix M. In the above statements, it is essential that the positive definiteness is with respect to complex data: if A is real and x T Ax > 0 for all x R n, x 0, then, we can not conclude that A is symmetric. (f) Give an example of a non-symmetric 2 2 real matrix A for which x T Ax > 0 for all x R 2, x 0. 11

F Quiz Exercise 0.28. Let A = 0 1 0 1 0 1 0 1 0. (a) Wat is the Range of A? (b) What is the Null space of A? (c) What is the rank of A? (d) What are the eigenvalues of A? 12

Lecture 1 Basic Notions A Norms Let V be a (complex) linear space. A map : V [0, ) is a norm on V and (V, ) is a normed space if 1) x = 0 x = 0 (x V) 2) αx = α x (x V, α C) 3) x + y x + y (x,y V) (1.1) Norms are used to measure errors (approximation errors as well as errors coming from rounded arithmetic). A sequence (x n ) in V converges to x V, if x n x 0 if n. Formally, we should say that the sequence converges with respect to the norm. However, for convergence, it does not matter what norm is used if V is finite dimensional. Theorem 1.1 If V is finite dimensional, then all norms on V are equivalent, i.e., if and are norms on V, then there are constants M, m, M > m > 0 such that m x x M x (x V). Exercise 1.1. Proof of Theorem 1.1. Let V be finite dimensional with norm. Let v 1,...,v k be a basis. Define j α jv j (α 1,..., α j ) T max j α j. (a) Show that is a norm on V. (b) Prove that x M x (x V) for some M ( j v j ). (c) Prove that S {(α 1,...,α k ) T C k j α jv j = 1} is a closed bounded subset of C k. Conclude that 0 < K argmin{max α j (α 1,...,α k ) T S} j and, therefore, with m 1/K, we have m x x. (d) Prove Theorem 1.1. In practice, it is often said that a sequence (x n ) (of finitely many x n ) is converging to x if for some (large) n, x n x < tol, where tol is some prescribed tolerance (or accuracy). Now, the accuracy depends on the norm that is used (the M and m affect the actual value of the error bound). Below p, q [1, ]. The important cases are p, q {1, 2, }. x = (x 1,...,x k ) T is a k-vector. The p-norm x p is defined by 1 p + 1 p and x p p xi p (p [1, )), x max x i. i The following duality relation between a p-norm and a p -norm with p [1, ] such that = 1 can be usefull: [Hölder s inequality] (x,y) x p y p (y C k ) (1.2) x p = sup{ (x,y) y p 1}. (1.3) Note that p = 2 if p = 2. In particular, Hölder s inequality in (1.2) can be viewed as a generalisation of the Cauchy Schwartz inequality: (x,y) x 2 y 2 (y C k ). (1.4) 13

Property 1.2 For x C k and p, q [1, ] we have that x p k 1 q 1 q x q if p q and x p x q if p q. (1.5) In particular, x 1 k x 2, x 2 k x and x x 2 x 1. (1.6) Exercise 1.2. (a) Prove (1.5) and (1.6). (b) Show that the estimates in (1.5) and (1.6) are sharp, that is, give for each of the inequalities, a non-trivial vector x for which the inequality is an equality. (c) Note that x p can also be defined for p (0, 1): x p p xi p. Show that, for these p, p does not define a norm. (d) Sometimes x 0 is used to denote the number of non-zero coordinates of x, i.e., Show that x 0 = lim p>0,p 0 x p. x 0 #{i {1,..., k} x i 0}. (e) Sketch, for k = 2, the unit balls {x R 2 x p 1} for p {, 2, 1, 1 2 } (and for p = 0?). For what values of p is this ball a convex set? Below A = (A ij ) is an n k matrix, A is the matrix ( A ij ). Norms on R n (or on C n ) induce norms on matrices. The induced p-norm is defined by 4 A p sup{ Ax p x p 1}. Convention 1.3 If we use the same notation for a norm on vector spaces as R n (or C n ) and on matrices of matching size, then we assume that the norm on matrices is induced by the norm on vectors: A sup Ax with supremum over all x with x 1. A M Non-induced norms are also frequently used, as the Frobenius norm A F and the norm A F A ij 2 = Ae j 2 2, i,j j A M max A ij. i,j An n k matrix A can be associated to an nk-vector A by putting the consecutive columns of A below each others. Note that A F = A 2, A M = A. The 2-norm is important for mathematical reasons: it is associated to the inner product (x,y) y H x and results as Cauchy-Schwartz and Pythagoras can be used. Other norms are easier to compute and are frequently used in error analysis. Exercise 1.3. Prove that p, F and M are norms on the space of n k matrices. Exercise 1.4. (a) Diagonal matrices. Let D = diag(d i ) be an n n diagonal matrix. Show that for any induced p-norm (1 p < ) we have that D p = max D i. i 4 More generally, a norm on C n and a norm on C k induce a norm on the space of n k matrices by sup{ Ax x C k, x 1}. However, we will not pursue this generalisation here. 14

(b) 1- and -norms. Prove that A 1 = max j i A ij = max Ae j 1, j A = max i j A ij = max e ia 1. (1.7) i (c) Duality. Prove that for p [1, ] such that 1 p + 1 p = 1, we have Here, you can use (1.2). A p = A H p, in particular A 2 = A H 2. (1.8) (d) Equivalence. For q [p, ], prove that x q x p and x p κ x q where κ k 1 p 1 q and for all q [1, ], A q κ A p where κ k 1 p 1 q. Theorem 1.4 Let A be an n n matrix with spectral radius ρ(a) (cf., p.9). If A is normal (A H A = AA H, in particular, if A Hermitian), then For any norm on the space of all n n matrices, we have A 2 = ρ(a). (1.9) ρ(a) = lim j j A j. (1.10) Exercise 1.5. Proof of Theorem 1.4. (a) Prove (1.9) for normal matrices A, To prove (1.10), put ρ(a) lim j j A j for the right hand side expression in (1.10). Let A be an n n matrix. (b) Use Theorem 1.1 to prove that ρ(a) does not depend on the norm. Take =. (c) Let x be an eigenvector associated to the absolute largest eigenvalue, scaled such that x = 1. Show that ρ(a) = j A j x and conclude that ρ(a) ρ(a). (d) Show that ρ(a) = ρ(t 1 AT) for any non-singular matrix T. (e) Let ε > 0. Show there is a non-singular matrix T such that J T 1 AT is on Jordan normal form be it that the (i, i + 1) entries of the Jordan blocks J λ are ε rather than 1 (cf. Theorem 0.6). Show that J λ λ + ε. Conclude that ρ(a) ρ(a) + ε. Prove (1.10). Exercise 1.6. Let A be an n k matrix. (a) 2-norm. Show (from the definitions) that (i) A 2 = A H A 2, (ii) A 2 = A H 2, (iii) A 2 A F, (iv) A F k A 2, (v) A F = trace(a H A). (1.11) Show that the inequalities (iii) and (iv) are sharp, that is, give a non-trivial A (i.e., A 0) such that A 2 = A F and a(nother) non-trivial A such A F = k A 2. Show also (here you can use (1.9)) that A 2 A 1 A. (1.12) 15

(b) Multiplicativity. Let B be a k m matrix. Prove that AB p A p B p (p [1, ]), AB F A 2 B F A F B F : (1.13) p-norms and the Frobenius norm are multiplicative. 5 Is the M-norm M also multiplicative? (c) Prove that orthonormal transformations preserve the 2-norm, that is, if A is orthonormal, then AB 2 = B 2. Similarly, if B is orthonormal then AB 2 = A 2. Do we also have that AB 2 = A 2 in case B is orthonormal? Do orthonormal matrices also preserve the p-norm for p [1, ], p 2? (d) Prove that, for any square matrix A, and any induced norm (and even for any multiplicative norm), we have that ρ(a) A. (1.14) In estimates of effects of rounding errors involving an n k matrix A = (A ij ), the matrix A ( A ij ) shows up, cf., the Sections E and F below. For instance, the vector b that we obtain from the matrix vector multiplication Ax, b = Ax, using rounded arithmetic (that is, the computer result) is equal to the exact matrix vector multiplication (A + )b for some n k perturbation matrix with such that p A u A, where the inequality is entry-wise, u is the relative machine precision (in Matlab u = 0.5 eps = 0.5 10 16 ; see Exercise 1.20 below), and p A is the maximum non-zeros per row of A. is a perturbation of A. Estimates of the size of in terms of A will involve A, e.g., 2 2 p A u A 2 if p A u A (inequalities matrix entry wise). The following exercise relates A and A. Exercise 1.7. The norm of A. (a) Prove that Let A be an n k matrix. A p A p (p [1, ]), A F = A F, A 2 k A 2. (1.15) Prove also, that for p {1, }, A p = A p. May we expect that A 2 = A 2? (b) Prove that A H A 2 min( k, n) A H A 2. (1.16) In particular, if A = CC H for an n n matrix C (A is n n and positive definite), then C C H 2 n A 2 (see Exercise 2.21). (c) If A is sparse. Put p c max j #{i A ij 0} and p r max i #{j A ij 0}, the maximum number of non-zeros per column, and per row, respectively. We will prove that Prove that, for all k-vectors x = (x 1,..., x k ) T with x 2 = 1, A x 2 2 = n A 2 min(p r, p c ) A 2. (1.17) i=1 (x, A H e i ) 2 p c max A H e i 2 2 i (Hint: first show that n i=1 j,a x ij 0 j 2 n p c i=1 x i 2 = p c ). Conclude that A 2 p c max A H e i 2 i and A 2 p r max Ae j 2 j and that (1.17) is correct. The theorem below summerizes the main results in this section. 5 A norm on the space of n n matrices is multiplicative if AB A B for all n n matrices A and B. 16

Theorem 1.5 Here, A is an n k matrix, B is a k m matrix, and x is a k-vector. Let be a norm on matrices induced by a vector norm. Then 1 3) is a norm on the space of n n matrices, 4) Ax A x, 5) AB A B. 6 ρ(a) A, with equality if A is normal and = 2. A 1 is the maximum column absolute sum, A is the maximum row absolute sum. A 2 = A H A 2 A F k A 2, A 2 A 1 A, AB F A 2 B F. If p A is the maximum number of non-zeros per row of A, then A 2 p A A 2. Note 1.6 In the discussion above involving induced matrix norms, we implicitly assumed that we used the same norm for k- as for n-vectors. But different norms can be employed as well. Notations as A p q are used to indicate that for the n k matrix A the q-norm is used on C k and the p-norm on C p. We will not go into this type of details in this course. For an n n Hermitian, semi-positive definite matrix A, i.e., x Ax 0 for all x C n, put x A x Ax (x C n ). (1.18) Exercise 1.8. The A-norm. Show that A defines a norm on C n in case A is positive definite (i.e., A is semi-positive definite and x Ax = 0 only if x = 0). To relate inner products and norms, the following parallel law can be usefull. For x, y C n, with ζ the sign of y Ax, that is, ζ C, ζ = 1 such that ζ y Ax 0, we have that 4y Ax = ζ ( x + ζ y 2 A x ζ y 2 A) (1.19) Exercise 1.9. The parallel law. (a) Prove parallel law (1.19). (b) Conlcude that A = 0 if x Ax = 0 for all x C n. Exercise 1.10. Orthonormal matrices. Let V [v 1,...,v k ] be an n k matrix. We say that (a transformation by) V preserves the 2-norm if Vx 2 = x 2 for all x C k. V preserves orthogonality if Vx Vy for all x, y C k for which x y. (a) Assume V preserves the 2-norm. Prove that V preserves orthogonality. (Hint: consider V(x + ζy) 2 for ζ C, ζ = 1.) (b) Prove that the following four properties are equivalent: V is orthonormal V preserves the 2-norm. VX 2 = X 2 for all k l matrices X (with l 1). VX F = X F for all k l matrices X (with l 1). (c) Prove that the following two properties are equivalent: V preserves orthogonality. αv is orthonormal for some scalar α. 6 Here you may assume that k = m = n or that the norms are induced p-norms, p [1, ], with the same p on all spaces. 17

B Perturbations and the conditioning of a problem Let A be an n n matrix. For a given vector b C n, we are interested in solving the linear system Ax = b (1.20) for x C n, and we are interested in solving the eigenvalue problem Ax = λx (1.21) for a non-trivial x C n (eigenvector) and a scalar λ C (the eigenvalue associated with x). In practice, the problems will be perturbed (by rounding errors, model errors [from discretization, from measurements, etc.]). For some (small) n n matrix, we will have A + rather than A. is a perturbation of A and we will be solving a perturbed problem. b will be perturbed as well. If small perturbations of the problem lead to large errors in the solution, then the problem is said to be ill conditioned, and we can not expect to be able to compute accurate solutions. Additional information is needed, i.e., the problem has to be modified (adapted model) to a well-conditioned one (for instance, in case of (1.20), find an x with minimal norm for which Ax b 2 ε). Of course, these modifications should render the problem into a realistic model of the practical underlying problem that is to be solved. Condition numbers quantify how sensitive a problem is to perturbations (recall Conv. 1.3): Theorem 1.7 If Ax = b and (A + ) x = b + δ b then ( x x C(A) x A + δ ) b, where C(A) A A 1. b For an exact upper bound divide the expression at the right-hand side by 1 C(A) A. C(A) is the condition number of A (formally of the linear problem (1.20)) with respect to the norm. The condition number tells us how perturbations on the matrix and on the right-hand side vector affect the relative accuracy of the solution. The perturbations are also measured in some relative sense. (And the perturbations should not be too large: C(A) A < 1). The estimate for the case δ b = 0 is sharp: for a given δ > 0 (small), there is perturbation such that = δ and x x = A 1 x, where x solves (A + ) x = b. Exercise 1.11. Proof of Theorem 1.7. (a) Show that x x = A 1 x + A 1 δ b. (b) Note that b A x and ex x 1 + ex x x. (c) Prove Theorem 1.7. (d) Discuss the sharpness of the estimate b A x. Exercise 1.12. Prove that if Ax = b and (A + ) x = b then x x x C(A) A with C(A) the condition number of A with respect to the norm. If the n n matrix A is ill conditioned (large condition number) then small perturbations can lead to large errors in the solution of linear systems with A. Therefore, it seems a good idea to minimise the condition number with some simple manipulations as scaling the rows, i.e., work with the system DAx = Db instead of Ax = b with D an appropriate diagonal matrix. 18

Note that row scaling does not affect the sparsity structure of the matrix: the set of (i, j) of non-zero matrix entries is the same for A and DA. Moreover, row scaling does not affect the solution x. 7 Of course the row scaling has to be applied before perturbation (by computational steps) are being introduced. Otherwise the row scaling would only be cosmetic. Scaling such that all rows have equal norm, row equilibration, seems to be the best as we will see in the next exercise. To preserve algebraic structure (symmetry, Hermitian), rows as well as columns have to be scaled. Exercise 1.13. Row scaling. Let be a multiplicative norm. C(A) A 1 A. (a) Prove that A 1 (DA) 1 D for any n n matrix D. (b) Conclude that C(A) C(DA) if D A = DA. (c) If all rows of A have 1-norm equal to 1, A e j 1 = 1 all j, then DA = D for all diagonal D. (d) Show that for an arbitrary n n matrix A the diagonal D with jth diagonal entry 1/ A e j 1 leads to smallest the condition number C (DA) with respect to the norm, smallest w.r.t. to all diagonal scalings. (e) If all rows of A have 2-norm equal to 1, A e j 2 = 1 all j, then DA F = D F for all diagonal D. (f) For an n n matrix A, let D 0 be the n n diagonal matrix with jth diagonal entry equal to 1/ A e j 2. Show that for any n n diagonal matrix D we have that C F (D 0 A) n C F (DA): except for a factor at most n, row equilibration (w.r.t. the 2-norm) leads to the smallest condition number in Frobenius norm (here, denoted by C F ). C Forward and backward error Let x be a solution of (1.20) or of (1.21) with eigenvalue λ. If u is a vector in C n that approximates x (an approximate solution), and, in case of (1.21), ϑ is an approximate eigenvalue, then x u (and λ ϑ) is the error, also called forward error, 8 r A(x u) = b Au is the residual for the linear system and r Au ϑu is the residual for the eigenvalue problem. If there is an n n matrix, a perturbation of A such that (A + )u = b (1.22) in case of the linear system and (A + )u = ϑu (1.23) in case of the eigenvalue problem, then is a backward error. With the backward error, the approximate solution is viewed as an exact solution of a (hopefully slightly) perturbed problem: the error in the solution is trowed back to the problem. The challenge is, given an approximate solution u, to find a perturbation with, or rather / A, as small as possible. The idea here is that if the (scaled) backward error is small, then the numerical method that produced the approximate solution u is stable, even if the error x u happens to be large. In such a case, the problem is unstable: the problem is to blame for the inaccurate solution, rather than the numerical method. For some applications, for maintaining the point of view that the problem is to blame, it is more realistic to require the requested perturbation to have a similar structure as the matrix 7 If we scale the columns, then we have to unscale to solution of the scaled system to find the desired solution. With scaling we tried to minimise effect of perturbations, by unscaling we might reverse this beneficial action. 19

A: rather than having / A small, it is required to have < ε A for some small ε. Here the inequality is matrix entry wise. Or, if A is symmetric, then should be symmetric as well. For the linear system case, the backward error can also be formulated as a perturbation on b. Note that the error is not readily available. The following theorem shows that a backward error can be expressed in terms of residuals and approximate solutions. Since it is reasonable to assume that the problem is (reasonably) well-conditioned, it makes sense to try to design numerical algorithms that produce approximate solutions with small residuals. Moreover, if we can quantify (in terms of properties of A,... ) how solutions respond to perturbations on the problem, then we can bound the error in terms of the residual (and these properties of A). In summary, Design algorithms that lead to small residuals (small backward error). Analyse the effect of perturbations on the solution (forward error analysis). Theorem 1.8 Let be the n n matrix given by ru u u. Equation (1.22) holds if r b Au. Equation (1.23) holds if r (Au ϑu). Moreover, has rank 1 and 2 r 2. (1.24) A 2 A 2 u 2 From (1.24), we conclude that the size if the residual scaled by the norm of the matrix and the approximate solution appears to be an appropriate measure for the backward error. Exercise 1.14. Proof of Theorem 1.8. (a) Prove Theorem 1.8. The perturbation term for which, say, (1.22) holds is not unique. Put 1 ru u u. (b) Show that = 2, where 2 rr r u, also satisfies (1.22), provided that u r. Note that this 2 is Hermitian. Show that, (c) Consider 1 2 = cos (r,u) r 2 u 2 r 2 u 2 1 cos (r,u) r 2 u 2 = 2 2. 3 bb Au (Au) b u u A u. Show that = 3 satisfies (1.22). Discuss rank, symmetry and size of 3 (in terms of r 2 ). (d) Assume that A is positive definite. Prove that A+ 3 is positive definite as well as soon as b u > 0. Show that b u > 0 if u is sufficiently close to the solution x of the system Ax = b. D Perturbed problems Exercise 1.15. Let A be an n n matrix. (a) Show that (I A) 1 = I + A + A 2 + A 3 +.... (1.25) holds if A < 1 for some multiplicative norm. You can do this by combining Theorem 0.11 and (1.10), but give also an elementary proof. In particular, A < 1 implies that I A is non-singular. 20

Theorem 1.9 Let A and be n n matrices, A is non-singular, a multiplicative norm. Put δ A 1. Then δ A 1. If δ < 1, then A + is non-singular, (A + ) 1 A 1 1 1 δ and A 1 (A + ) 1 A 1 δ 1 δ. (1.26) Exercise 1.16. Proof of Theorem 1.9. (a) Assume δ < 1. Prove that A+ is non-singular and that first estimate in (1.26) is correct. (b) Show that A 1 (A + ) 1 = A 1 (A + ) 1. Derive the second estimate of (1.26). The following theorem can be viewed as a perturbation theorem, where the off-diagonal entries (the matrix E in Exercise 1.17) are the perturbations of a diagonal matrix (D in Exercise 1.17). However, it is also of interest without this interpretation: the theorem gives a simple way of estimating eigenvalues. Moreover, the proof is a simple and nice application of the perturbation Theorem 1.9. Theorem 1.10 (Gershgorin s theorem) Let A = (A ij ) be an n n matrix. A Gershgorin disk is a disk D i in C with centre A ii and radius j,j i A ij : D i {ζ C A ii ζ ρ i } with ρ i A ij (i = 1,..., n). j,j i Each eigenvalues of A is contained in some Gershgorin disk: Λ(A) i D i. Exercise 1.17. Gershgorin s theorem. Let D diag(a) be the diagonal of A and E the outer diagonal (i.e., D ii = A ii all i, D ij = 0 all i, j, i j, E A D). Put ρ i j,j i A ij = E e i 1 (i = 1,..., n). (a) Show that, for λ C, A λi is non-singular if (D λi) 1 E < 1. (b) Show that (D λi) 1 E p (D λi) 1 p E p < 1 if E p < min i A ii λ. (c) Conclude that (Bauer Fike s Theorem holds): A ii λ E p for some i, if λ is an eigenvalue of A. (d) Show that (D λi) 1 E < 1 if and only if ρ i < A ii λ for all i. (e) Conclude that Theorem 1.10 holds. Gershgorin disks indicate how eigenvalues can get perturbed (D is an diagonal matrix of eigenvalues, E is a perturbation matrix). In practice it appears that the effect of the perturbation A ij on the eigenvalue A ii is often better described by A ij A ii A jj than by A ij. (f) Derive a variant of Gershgorin s theorem using the 1-norm rather than the -norm. Note that the theorem does not exclude the possibility that all eigenvalues are contained in the same Gershgorin disk. The following theorem states that the eigenvalues depend continuously on a parameter if the matrix depends continuously on that parameter. Bauer Fike s Theorem can be used to prove this result: however, we will not give further details here. Theorem 1.11 Assume F(τ) is an n n matrix for all τ in some subset I of C. If F depends continuously on τ, then there are continuous complex-valued functions µ 1,...,µ n on I such that µ 1 (τ),..., µ n (τ) are the eigenvalues of F(τ) counted according to multiplicity (τ I). This theorem allows a continuity argument to prove that 21

Theorem 1.12 (Gershgorin s theorem 2) If precisely p Gershgorin disks are connected, then the union of these p disks contains exactly p eigenvalues of A. Exercise 1.18. Proof of Theorem 1.12. Consider Theorem 1.10. From Theorem 1.10 we know that Λ(A) D i. (a) Give an example that shows that not all Gershgorin disk contains at least one eigenvalue of A (Hint: Replace the (n, 1) entry of I+εS by 1. Here S is the shift matrix that assigns e i 1 to e i ). A subset G of C is connected if for all ζ 0 and ζ 1 in C there is a continuous curve in G that connects ζ 0 and ζ 1 (i.e., for some continuous function φ : [0, 1] C we have that φ(0) = ζ 0, φ(1) = ζ 1, φ(t) G (t [0, 1])). (b) Suppose there is a subset E of {1, 2,...,n} of p numbers such that G D i is connected, while G D j = (j E). i E Prove that G contains exactly p eigenvalues of A. E Rounding errors Many results an details on effects of rounding errors (specifically in algorithms for dense matrices) can be found in [1]. Convention 1.13 For ease of notation, we follow the following conventions. ξ is a number in [ u,u] with u the relative machine precision (u is 0.5*eps in Matlab). ξs on different locations can have different values. 9 Formulae involving a ξ are to read from left to right. We neglect order u 2 terms. (As an alternative, replace quantities as nξ by nξ 1 nξ.) Exercise 1.19. (a) Following the above conventions, show that the statement ξ = 2ξ is correct while 2ξ = ξ is wrong. (b) If α and β are scalars then αξ + βξ = ( α + β )ξ. Prove that this formula is sharp, i.e., it is correct and there are ξ 1, ξ 2 [ u,u] for which αξ 1 + βξ 2 = ( α + β )u. Notation 1.14 If B is an n k matrix (k and n can be 1) to be obtained by computation with computational rules (an algorithm) that are clear from the context, then B denotes the quantity as actually computed. We assume that the input values (matrix entries) are machine numbers (real or complex numbers that can be represented in the computer). If B is defined by a (longer) expression, then we will also use the notation B or fl(b) instead of B. We follow the following rules. Rule 1.15 If α and β are machine numbers, then the result (α β) obtained in the computer by a floating point operation (flop) or basic arithmetic operation, i.e., represents +,,, or /, is the exact result α β with a relative error of at most u: (α β) = (α β)(1 + ξ). (β 0 if is /). That is, (α + β) = (α + β)(1 + ξ),.... Operations with 0 (α 0 or β = 0) are exact (again β 0 if is /). 9 We rather put x 1ξ + x 2ξ than x 1ξ 1 + x 2ξ 2. 22