Lecture Notes for Inf-Mat 3350/4350, Tom Lyche

Lecture Notes for Inf-Mat 3350/4350, 2007 Tom Lyche August 5, 2007

Contents Preface vii I A Review of Linear Algebra 1 1 Introduction 3 1.1 Notation............................... 3 2 Vectors 5 2.1 Vector Spaces............................ 5 2.2 Linear Independence and Bases.................. 8 2.3 Operations on Subspaces...................... 10 2.3.1 Sums and intersections of subspaces......... 10 2.3.2 The quotient space................... 12 2.4 Norms................................ 12 2.5 Convergence of Vectors....................... 14 2.5.1 Convergence of series of vectors............ 16 2.6 Inner Products............................ 17 2.7 Orthogonality............................ 19 2.8 Projections and Orthogonal Complements............ 22 3 Matrices 23 3.1 Arithmetic Operations and Block Multiplication......... 23 3.2 The Transpose Matrix....................... 27 3.3 Linear Systems........................... 28 3.4 The Inverse matrix......................... 29 3.5 Rank, Nullity, and the Fundamental Subspaces.......... 31 3.6 Linear Transformations and Matrices............... 34 3.7 Orthonormal and Unitary Matrices................ 35 4 Determinants 37 4.1 Permutations............................. 37 4.2 Basic Properties of Determinants................. 39 4.3 The Adjoint Matrix and Cofactor Expansion........... 43 i

ii Contents 4.4 Computing Determinants...................... 45 4.5 Some Useful Determinant Formulas................ 48 5 Gaussian Elimination 49 5.1 Gaussian Elimination and LU-factorization............ 50 5.1.1 Algoritms........................ 52 5.1.2 Operation count.................... 54 5.2 Pivoting............................... 55 5.2.1 Permutation matrices................. 56 5.2.2 Gaussian elimination works mathematically..... 56 5.2.3 Pivot strategies..................... 57 5.3 The PLU-Factorization....................... 58 5.4 An Algorithm for Finding the PLU-Factorization........ 59 6 Eigenvalues and Eigenvectors 63 6.1 The Characteristic Polynomial................... 63 6.1.1 The characteristic equation.............. 63 6.2 Similarity Transformations..................... 67 6.3 Linear Independence of Eigenvectors............... 68 6.4 Left Eigenvectors.......................... 71 II Some Linear Systems with a Special Structure 73 7 Examples of Tridiagonal Systems 75 7.1 Cubic Spline Interpolation..................... 75 7.1.1 Splines......................... 75 7.1.2 Cubic spline interpolation............... 77 7.1.3 LU-factorization of a tridiagonal system....... 81 7.2 A Two-Point Boundary Value Problem.............. 84 7.2.1 Numerical example................... 87 7.3 An Eigenvalue Problem....................... 88 7.3.1 The buckling of a beam................ 88 7.3.2 The eigenpairs of the matrix tridiag( 1, 2, 1)... 89 8 LU Factorization and Positive Definite Matrices 91 8.1 Algebraic Properties of Triangular Matrices........... 91 8.2 The LU and LDL T -Factorizations................. 93 8.3 Positive Definite Matrices...................... 95 8.3.1 Definitions and examples............... 95 8.3.2 When is a matrix positive definite?.......... 97 8.3.3 Algorithms....................... 99 9 Some Bivariate Problems and the Kronecker Product 103 9.1 The Poisson Problem........................ 103 9.1.1 Block LU-factorization of a block-triangular matrix 106

Contents iii 9.2 The Kronecker Product....................... 107 10 Fast Direct Solution of a Large Linear System 113 10.1 A Fast Poisson Solver based on Diagonalization......... 113 10.2 A Fast Poisson Solver based on the Discrete Sine and Fourier Transforms.............................. 114 10.2.1 The discrete sine transform (DST).......... 115 10.2.2 The discrete Fourier transform (DFT)........ 115 10.2.3 The fast Fourier transform (FFT).......... 116 10.2.4 A Poisson solver based on the FFT......... 119 10.3 Problems............................... 119 III Some Matrix Theory 123 11 Orthonormal Eigenpairs and the Schur Form 125 11.1 The Schur Form........................... 125 11.2 Hermitian and Normal Matrices.................. 127 11.2.1 The Spectral Theorem................. 128 11.3 The Rayleigh Quotient and Minmax Theorems......... 129 11.3.1 The Rayleigh Quotient................ 129 11.3.2 Minmax Theorems................... 130 11.3.3 The Hoffman-Wielandt Theorem........... 132 11.4 Proof of the Real Schur Form................... 132 12 The Singular Value Decomposition 135 12.1 Singular Values and Singular Vectors............... 135 12.1.1 Singular values..................... 135 12.1.2 The singular value decomposition.......... 136 12.1.3 Three forms of the SVD................ 139 12.1.4 The SVD of A H.................... 140 12.1.5 Singular vectors.................... 141 12.1.6 The SVD of A H A and AA H............. 142 12.1.7 Singular values of normal and positive definite matrices143 12.1.8 A geometric interpretation.............. 143 12.2 The Pseudo-inverse and Orthogonal Projections......... 145 12.2.1 The pseudo-inverse................... 145 12.2.2 Orthogonal projections................ 146 12.3 The Minmax Theorem for Singular Values and the Hoffman- Wielandt Theorem......................... 148 13 Matrix Norms 151 13.1 Matrix Norms............................ 151 13.1.1 Consistent and Subordinate Matrix Norms..... 153 13.1.2 Operator norms.................... 154 13.1.3 The p-norms...................... 155

iv Contents 13.1.4 Unitary invariant matrix norms............ 157 13.1.5 Absolute and monotone norms............ 159 13.2 The Condition Number with Respect to Inversion........ 159 13.3 Determining the Rank of a Matrix................. 162 IV Iterative Methods for Linear Systems 165 14 The Classical Iterative Methods 167 14.1 Convergence and Spectral Radius................. 167 14.1.1 Convergence in R m,n and C m,n............ 168 14.1.2 The spectral radius.................. 168 14.2 Classical Iterative Methods; Component form.......... 171 14.3 Classical Iterative Methods; Matrix Form............. 173 14.4 Convergence............................. 175 14.5 Convergence of SOR........................ 177 14.6 Numerical Example......................... 179 15 The Conjugate Gradient Method 183 15.1 The Conjugate Gradient Algorithm................ 184 15.2 Numerical Example......................... 185 15.3 Derivation and Basic Properties.................. 187 15.4 Convergence............................. 191 16 Minimization and Preconditioning 197 16.1 Minimization............................. 197 16.2 Preconditioning........................... 199 16.3 Preconditioning Example...................... 202 16.3.1 A banded matrix.................... 202 16.3.2 Preconditioning.................... 205 V Orthonormal Transformations and Least Squares 207 17 Orthonormal Transformations 209 17.1 The QR decomposition and QR factorization........... 209 17.1.1 QR and Gram-Schmidt................ 211 17.2 The Householder Transformation................. 212 17.3 Householder Triangulation..................... 215 17.4 Givens rotations........................... 216 18 Least Squares 219 18.1 Examples............................... 220 18.2 Numerical Solution using the Normal Equations......... 224 18.3 Numerical Solution using the QR-factorization.......... 224 18.3.1 QR and linear systems................. 226 18.4 Numerical Solution using the Singular Value Factorization... 227

Contents v 18.5 Perturbation Theory for Least Squares.............. 227 18.5.1 Perturbing the right hand side............ 227 18.5.2 Perturbing the matrix................. 229 18.6 Perturbation Theory for Singular Values............. 230 VI Eigenvalues and Eigenvectors 233 19 Numerical Eigenvalue Problems 235 19.1 Perturbation of Eigenvalues.................... 235 19.1.1 Gerschgorin s theorem................. 237 19.2 Unitary Similarity Transformation of a Matrix into Upper Hessenberg or Tridiagonal Form.................... 240 19.3 Computing a Selected Eigenvalue of a Symmetric Matrix.... 242 19.3.1 The inertia theorem.................. 243 19.3.2 Approximating λ m................... 245 19.4 Perturbation Proofs......................... 246 20 The Power and QR methods 249 20.1 The Power method......................... 249 20.1.1 The inverse power method.............. 252 20.2 The QR-Algorithm......................... 253 20.2.1 The relation to the power method.......... 255 20.2.2 A convergence theorem................ 255 20.2.3 The shifted QR-algorithms.............. 257 VII Appendix 259 A Computer Arithmetic 261 A.1 Absolute and Relative Errors.................... 261 A.2 Floating Point Numbers...................... 262 A.3 Rounding and Arithmetic Operations............... 264 A.3.1 Rounding........................ 265 A.3.2 Arithmetic operations................. 265 A.4 Backward Rounding-Error Analysis................ 265 A.4.1 Computing a sum................... 266 A.4.2 Computing an inner product............. 268 A.4.3 Computing a matrix product............. 269 B Differentiation of Vector Functions 271 C Some Inequalities 275 C.1 Convexity.............................. 275 C.2 Inequalities.............................. 276 D The Jordan Form 279

vi Contents D.1 The Jordan Form.......................... 279 D.1.1 The minimal polynomial............... 282 Bibliography 285 Index 287

Preface These lecture notes are the result of an ongoing project to write a text for a course in matrix analysis and numerical linear algebra given at the advanced undergraduate and beginning graduate level at the University of Oslo. In the first 6 chapters we give a quick review of basis linear algebra. Some of the results in this first part are not found in most introductory linear algebra courses. In particular we discuss algorithmic aspects of Gaussian elimination. The lectures can start with chapter 7 and then each of the remaining chapters 8-20 should correspond to one week of lectures. Oslo, 05 August, 2007 Tom Lyche vii

viii Preface

List of Algorithms 5.5 lufactor................................. 53 5.6 forwardsolve.............................. 53 5.7 backsolve................................ 54 5.12 PLU-factorization........................... 60 5.14 Forward Substitution (column oriented)............... 61 5.15 Backward Substitution (column oriented).............. 61 7.10 locatesubinterval............................ 80 7.15 trifactor................................. 83 7.16 trisolve................................. 83 8.24 bandcholesky.............................. 100 8.25 bandforwardsolve............................ 100 8.26 bandbacksolve............................. 101 10.1 Fast Poisson Solver.......................... 114 10.4 Recursive FFT............................. 118 14.15 Stationary Iterative Methods..................... 175 15.4 Conjugate Gradient Iteration..................... 185 15.5 Testing Conjugate Gradient..................... 186 16.3 Preconditioned Conjugate Gradient Algorithm........... 201 17.12 Generate a Householder transformation............... 214 17.20 Upper Hessenberg linear system................... 218 18.10 Solving least squares by Householder QR.............. 225 19.11 Householder reduction to Hessenberg form............. 241 19.13 Assemble Householder transformations............... 241 20.3 The Power Method.......................... 251 20.5 Rayleigh quotient iteration...................... 253 ix

x List of Algorithms

List of Exercises 2.9..................................... 7 2.10..................................... 7 2.11..................................... 7 2.12..................................... 7 2.21..................................... 10 2.22..................................... 10 2.23..................................... 10 2.24..................................... 10 2.26..................................... 10 2.30..................................... 12 2.35..................................... 14 2.36..................................... 14 2.37..................................... 14 2.38..................................... 14 2.43..................................... 16 2.44..................................... 16 2.50..................................... 19 2.51..................................... 19 2.52..................................... 19 2.53..................................... 19 2.54..................................... 19 2.63..................................... 22 3.2..................................... 26 3.3..................................... 26 3.4..................................... 26 3.5..................................... 26 3.6..................................... 26 3.7..................................... 27 3.10..................................... 28 3.17..................................... 31 3.18..................................... 31 3.19..................................... 31 3.24..................................... 33 3.25..................................... 33 3.26..................................... 33 xi

xii List of Exercises 3.35..................................... 36 4.1..................................... 39 4.3..................................... 43 4.5..................................... 43 4.10..................................... 46 4.11..................................... 46 4.13..................................... 46 4.14..................................... 47 4.15..................................... 48 5.2..................................... 51 5.13..................................... 60 6.8..................................... 66 6.9..................................... 66 6.10..................................... 66 6.11..................................... 66 6.12..................................... 66 6.13..................................... 66 7.2..................................... 76 7.8..................................... 80 7.9..................................... 80 7.11..................................... 80 7.12..................................... 81 7.21..................................... 86 7.22..................................... 86 7.23..................................... 87 7.24..................................... 87 7.25..................................... 89 7.27..................................... 90 7.28..................................... 90 8.6..................................... 93 8.12..................................... 95 8.27..................................... 101 8.28..................................... 101 8.29..................................... 101 8.30..................................... 101 8.31..................................... 101 9.2..................................... 106 9.6..................................... 108 9.13..................................... 112 10.1..................................... 119 10.2..................................... 119 10.3..................................... 119 10.4..................................... 119 10.5..................................... 120 10.6..................................... 120 10.7..................................... 120

List of Exercises xiii 10.8..................................... 121 10.9..................................... 121 10.10..................................... 121 10.11..................................... 122 10.12..................................... 122 11.3..................................... 126 11.9..................................... 129 11.10..................................... 129 11.12..................................... 129 11.15..................................... 131 11.17..................................... 132 11.19..................................... 132 12.9..................................... 139 12.10..................................... 139 12.16..................................... 142 12.19..................................... 143 12.22..................................... 145 12.23..................................... 145 12.24..................................... 145 12.25..................................... 145 12.26..................................... 145 12.27..................................... 146 12.28..................................... 146 12.29..................................... 146 12.30..................................... 146 12.31..................................... 146 12.34..................................... 147 12.35..................................... 147 12.36..................................... 147 13.6..................................... 153 13.7..................................... 153 13.8..................................... 153 13.10..................................... 153 13.11..................................... 154 13.17..................................... 157 13.18..................................... 157 13.22..................................... 158 13.23..................................... 158 13.24..................................... 158 13.25..................................... 158 13.26..................................... 158 13.27..................................... 158 13.28..................................... 159 13.29..................................... 159 13.34..................................... 162 13.36..................................... 163

xiv List of Exercises 13.37..................................... 163 13.38..................................... 164 14.9..................................... 171 14.10..................................... 171 14.13..................................... 174 14.20..................................... 176 14.21..................................... 176 14.22..................................... 176 14.23..................................... 176 14.24..................................... 176 14.26..................................... 177 15.2..................................... 184 15.3..................................... 184 15.8..................................... 187 15.13..................................... 190 15.14..................................... 191 15.15..................................... 191 15.21..................................... 194 16.1..................................... 199 16.2..................................... 199 17.5..................................... 211 17.6..................................... 212 17.9..................................... 213 17.13..................................... 214 17.14..................................... 214 17.15..................................... 215 17.16..................................... 215 17.18..................................... 217 17.19..................................... 217 18.6..................................... 223 18.7..................................... 223 18.8..................................... 223 18.16..................................... 229 18.17..................................... 229 18.19..................................... 230 19.7..................................... 239 19.9..................................... 239 19.10..................................... 239 19.12..................................... 241 19.14..................................... 241 19.15..................................... 242 19.19..................................... 244 19.20..................................... 245 19.21..................................... 245 19.22..................................... 245 19.23..................................... 246

List of Exercises xv 19.25..................................... 246 20.1..................................... 250 20.12..................................... 256 A.4..................................... 262 A.6..................................... 264 A.8..................................... 265 B.1..................................... 271 D.3..................................... 280 D.4..................................... 280 D.6..................................... 281 D.7..................................... 281 D.9..................................... 282 D.10..................................... 283 D.11..................................... 283

xvi List of Exercises

Part I A Review of Linear Algebra 1

Chapter 1 Introduction 1.1 Notation The following sets will be used throughout these notes. 1. The set of natural numbers, integers, rational numbers, real numbers, and complex numbers are denoted by N, Z, Q, R, C, respectively. 2. R n is the set of n-tuples of real numbers which we will represent as column vectors. Thus x R n means x 1 x 2 x =.., where x i R for i = 1,..., n. Row vectors are normally identified using the transpose operation. Thus if x R n then x is a column vector and x T is a row vector. 3. R m,n is the set of m n matrices with real entries represented as a 11 a 12 a 1n a 21 a 22 a 2n A =.... a m1 a m2 a mn The entry in the ith row and jth column of a matrix A will be denoted by a i,j, a ij, A(i, j) or (A) i,j. We use the notations a 1j a 2j a.j =., a T at 2. i. = [a i1, a i2,..., a in ], A = [a.1, a.2,... a.n ] =.. a mj a T m. 3 x n a T 1.

4 Chapter 1. Introduction for the columns a.j and rows a T i. of A. We often drop the dots and write a j and a T i when no confusion can arise. If m = 1 then A is a row vector, if n = 1 then A is a column vector, while if m = n then A is a square matrix. In this text we will denote matrices by boldface capital letters A, B, C and vectors most often by boldface lower case letters x, y, z,. 4. The imaginary unit 1 is denoted by i. The complex conjugate and the modulus of a complex number z is denoted by z and z, respectively. Thus if z = x + iy = re iφ = r(cos φ + i sin φ) is a complex number then z = x iy = re iφ = cos φ i sin φ and z = x 2 + y 2 = r. Re(z) := x and Im(z) := y denote the real and imaginary part of the complex number z. 5. For matrices and vectors with complex entries we use the notation A C m,n and x C n. We identify complex row vectors using either the transpose T or the Hermitian (conjugate) transpose operation x H := x T = [x 1,..., x n ]. 6. The unit vectors in R n and C n are denoted by 1 0 0 0 0 1 0 0 e 1 = 0, e 2 = 0, e 3 = 1,, e n = 0,. 0. 0 while I n = I =: [δ ij ] n i,j=1, where δ ij :=. 0 { 1 if i = j, 0 otherwise, is the identity matrix of order n. Both the columns and the transpose of the rows of I are the unit vectors e 1, e 2,..., e n. 7. We use the following notations for diagonal- and tridiagonal n n matrices d 1 0 0 1 0 d 2 0 diag(d i ) = diag(d 1,..., d n ) :=....... = d... 0 0 d n d 1 c 1 a 2 d 2 c 2 B = tridiag(a i, d i, c i ) = tridiag(a, d, c) :=.......... 1 d n, a n 1 d n 1 c n 1 a n d n Here b ii = d i for i = 1,..., n, b i+1,i = a i+1, b i,i+1 = c i for i = 1,..., n 1, and b ij = 0 otherwise. 8. We use the colon equal symbol v := e to indicate that the symbol v is defined by the expression e..

Chapter 2 Vectors This chapter contains a review of vector space concepts that will be useful in this text. we start by introducing a vector space. To define a vector space we need a field F, a set of vectors V, a way to combine vectors called vector addition, and a way to combine elements of F and V called scalar multiplication. In the first part of this section F will be an arbitrary field, but later the field will be the set of real or complex numbers with the usual arithmetic operations. 2.1 Vector Spaces Definition 2.1 A field is a set F together with two operations +, : F F F such that for all a, b, c F the following arithmetic rules hold (A0) there exists an element 0 F such that a + 0 = a. (Am) there exists an element ( a) F such that a + ( a) = 0. We define subtraction as a b := a + ( b). (Aa) a + (b + c) = (a + b) + c. (Ac) a + b = b + a. (M1) there exists an element 1 F such that a 1 = a. (Mi) if a 0 then there exists an element a 1 F such that a a 1 = 1. (Ma) a (b c) = (a b) c. (Mc) a b = b a. (D) a (b + c) = a b + a c. The requirements (A0), (Am), (Aa) are the axioms for a group. They state that (F, +) is a group, and since in addition (Ac) holds then (F, +) is by defintion an 5

6 Chapter 2. Vectors abelian group. The axioms (M1), (Mi), (Ma), (Mc) state that (F\{0}, ) is an abelian group. Often we drop the dot and write ab for the product a b. Examples of fields are R or C with ordinary addition and multiplication. Definition 2.2 A vector space over a field F is a set V together with two operations vector addition, + : V V V and scalar multiplication, : F V V such that for all a, b F and v, w V the following hold (V) (V, +) is an abelian group. (Va) (a b) v = a (b v). (Vd1) (a + b) v = a v + b v. (Vd2) a (v + w) = a v + a w. (M1) 1 v = v. We denote a vector space by (V, F) or by V if the underlying field is clear from the context. Definition 2.3 Let (V, F) be a vector space and S a nonempty subset of V. Then (S, F) is a subspace of (V, F) if (S, F) is itself a vector space. It follows that (S, F) is a subspace of (V, F) if S is closed under vector addition and scalar multiplication, i.e. as 1 + bs 2 S for all a, b F and all s 1, s 2 S. For any vector space (V, F) the two sets {0}, consisting only of the zero element in V, and V itself are subspaces. They are called the trivial subspaces. Here are some examples of vector spaces. Example 2.4 (The Vector Spaces R n and C n ) In the following chapters we will deal almost exclusively with the vector spaces R n = (R n, R), C n = (C n, C) and their subspaces. Addition and scalar multiplication are defined by v + w = v 1 + w 1.. v n + w n, av = av 1... av n Example 2.5 (Subspaces of R 2 and R 3 ) For a given vector x R n let S = {tx : t R}. Then S is a subspace of R n, in fact it represents a straight line passing through the origin. For n = 2 it can be shown that all nontrivial subspaces of R 2 are of this form. For n = 3 the nontrivial subspaces are all lines and all planes containing {0}.

2.1. Vector Spaces 7 Example 2.6 (The Vector Space C(I)) Let F = R and let C(I) be the set of all real valued functions f : I R which are defined and continuous on an interval I R. Here the vectors are functions in C(I). Vector addition and scalar multiplication are defined for all f, g C(I) and all a R by (f + g)(x) := f(x) + g(x), (af)(x) := af(x), for all x I. C(I) = ( C(I), R ) is a vector space since the sum of two continuous functions is continuous, a constant times a continuous function is continuous vector addition and scalar multiplication are defined point-wise, so the axioms for a vector space follows from properties of real numbers. Example 2.7 (The Vector Space Π n ) Let Π n (I) be the set of all polynomials of degree at most n defined on a subset I R or I C. We write simply Π n if I = R or I = C. With pointwise addition and scalar multiplication defined as in Example 2.6 the set (Π n (I), R) is a subspace of (C(I), R). Definition 2.8 (Linear Combinations) The sum c 1 v 1 + c 2 v 2 + + c n v n with c i F and v i V for i = 1,..., n is called a linear combination of v 1,..., v n. We say that the linear combination is nontrivial if at least one of the c i s is nonzero. The set span{v 1,..., v n } := {c 1 v 1 + + c n v n : c i F, i = 1,..., n} spanned by v 1,..., v n V is a subspace of (V, F). A vector space V is called finite dimensional if it has a finite spanning set; i.e. there exist n N and {v 1,..., v n } in V such that V = span{v 1,..., v n }. Exercise 2.9 Show that the 0 of vector addition is unique and that {0} is a subspace. Exercise 2.10 Show that 0 x = 0 for any x V. Exercise 2.11 Show that span{v 1,..., v n } is a subspace. Exercise 2.12 Show that span{v 1,..., v n } is the smallest subspace containing the vectors v 1,..., v n.

8 Chapter 2. Vectors 2.2 Linear Independence and Bases Definition 2.13 Let X := {v 1,..., v n } be a set of vectors in a vector space (V, F). We say that X is linearly dependent if we can find a nontrivial linear combination which is equal to zero. We say that X is linearly independent if it is not linearly dependent. In other words c 1 v 1 + + c n v n = 0 for some c 1,..., c n F = c 1 = = c n = 0. The elements in a set of linearly independent vectors must all be nonzero and we have Lemma 2.14 Suppose v 1,..., v n span a vector space V and that w 1,..., w k are linearly independent vectors in V. Then k n. Proof. Suppose k > n. Write w 1 as a linear combination of elements from the set X 0 := {v 1,..., v n }, say w 1 = c 1 v 1 + +c n v n. Since w 1 0 not all the c s are equal to zero. Pick a nonzero c, say c i1. Then v i1 can be expressed as a linear combination of w 1 and the remaining v s. So the set X 1 := {w 1, v 1,..., v i1 1, v i1+1,..., v n } must also be a spanning set for V. We repeat this for w 2 and X 1. In the linear combination w 2 = d i1 w 1 + j i 1 d j v j, we must have d i2 0 for some i 2. Moreover i 2 i 1 for otherwise w 2 = d 1 w 1 contradicting the linear independence of the w s. So the set X 2 consisting of the v s with v i1 replaced by w 1 and v i2 replaced by w 2 is again a spanning set for V. Repeating this process n 2 more times we obtain a spanning set X n where all the v s have been replaced by w 1,..., w n. Since k > n we can then write w k as a linear combination of w 1,..., w n contradicting the linear independence of the w s. We conclude that k n. Definition 2.15 A finite set of vectors {v 1,..., v n } in a vector space (V, F) is a basis for (V, F) if 1. span{v 1,..., v n } = V. 2. {v 1,..., v n } is linearly independent. Theorem 2.16 Suppose (V, F) is a vector space and that S := {v 1,..., v n } is a spanning set for V. Then we can find a subset {v i1,..., v ik } of S that forms a basis for V. Proof. If {v 1,..., v n } is linearly dependent we can express one of the v s as a nontrivial linear combination of the remaining v s and drop that v from the spanning set. Continue this process until the remaining v s are linearly independent. They still span the vector space and therefore form a basis. Corollary 2.17 A vector space is finite dimensional if and only if it has a basis.

2.2. Linear Independence and Bases 9 Proof. Let V = span{v 1,..., v n } be a finite dimensional vector space. By Theorem 2.16 V has a basis. Conversely, if V = span{v 1,..., v n } and {v 1,..., v n } is a basis then it is by defintion a finite spanning set. Theorem 2.18 Every basis for a vector space V has the same number of elements. This number is called the dimension of the vector space and denoted dim V. Proof. Suppose X = {v 1,..., v n } and Y = {w 1,..., w k } are two bases for V. By Lemma 2.14 we have k n. Using the same Lemma with X and Y switched we obtain n k. We conclude that n = k. The set of unit vectors {e 1,..., e n } form a basis for both R n and C n. The dimension of the trivial subspace {0} is defined to be zero. Theorem 2.19 Every linearly independent set of vectors {v 1,..., v k } in a finite dimensional vector space V can be enlarged to a basis for V. Proof. If {v 1,..., v k } does not span V we can enlarge the set by one vector v k+1 which cannot be expressed as a linear combination of {v 1,..., v k }. The enlarged set is also linearly independent. Continue this process. Since the space is finite dimensional it must stop after a finite number of steps. It is convenient to introduce a matrix transforming a basis in a subspace into a basis for the space itself. Lemma 2.20 Suppose S is a subspace of a finite dimensional vector space (V, F) and let {s 1,..., s n } be a basis for S and {v 1,..., v m } a basis for V. Then each s j can be expressed as a linear combination of v 1,..., v m, say s j = m a ij v i for j = 1,..., n. (2.1) i=1 If x S then x = n j=1 c js j = m i=1 b iv i for some coefficients b := [b 1,..., b m ] T, c := [c 1,..., c n ] T. Moreover b = Ac, where A = [a ij ] C m,n. The matrix A has linearly independent columns. Proof. (2.1) holds since s j V and {v 1,..., v n } spans V. Since {s 1,..., s n } is a basis for S and {v 1,..., v m } a basis for V every x S can be written x = n j=1 c js j = m i=1 b iv i for some scalars (c j ) and (b i ). But then x = n j=1 (2.1) c j s j = n ( m ) m ( n ) m c j a ij v i = a ij c j vi = b i v i. j=1 i=1 Since {v 1,..., v m } is linearly independent it follows that b i = n j=1 a ijc j for i = 1,..., m or b = Ac. Finally, to show that A has linearly independent columns i=1 j=1 i=1

10 Chapter 2. Vectors suppose b := Ac = 0 for some c = [c 1,..., c n ] T. Define x := n j=1 c js j. Then x = m i=1 b iv i and since b = 0 we have x = 0. But since {s 1,..., s n } is linearly independent we have c = 0. The matrix A in Lemma 2.20 is called a change of basis matrix. Exercise 2.21 Show that the elements in a linearly independent set must be nonzero. Exercise 2.22 Show that the set of unit vectors {e 1,..., e n } form a basis both for R n and for C n. Why does this show that the dimension of R n and C n is n? 2.3 Operations on Subspaces Let R and S be two subsets of a vector space (V, F) and let a be a scalar. The sum, multiplication by scalar, union, and intersection of R and S are defined by R + S := {r + s : r R and s S}, (2.2) as := {as : s S}, (2.3) R S := {x : x R or x S}. (2.4) R S := {x : x R and x S}. (2.5) Exercise 2.23 Let R = {(x, y) : x 2 + y 2 1} be the unit disc in R 2 and set S = {(x, y) : (x 1 2 )2 + y 2 1}. Find R + S, 2S, R S,, and R S. 2.3.1 Sums and intersections of subspaces In many cases R and S will be subspaces. Then as = S and both the sum and intersection of two subspaces is a subspace of (V, F). Note however that the union R S of two subspaces is not necessarily a subspace. Exercise 2.24 Let R and S be two subspaces of a vector space (V, F). Show that as = S and that both R + S and R S are subspaces of (V, F). Example 2.25 For given vectors x, y R n with x and y linearly independent let R = span{x} and S = span{y}. Then R and S are subspaces of R n. For n = 2 we have R + S = R 2, while for n = 3 the sum represents a plane passing through the origin. We also see that R S = {0} and that R S is not a subspace. Exercise 2.26 Show the statements made in Example 2.25. Theorem 2.27 Let R and S be two subspaces of a vector space (V, F). Then dim(r + S) = dim(r) + dim(s) dim(r S). (2.6)

2.3. Operations on Subspaces 11 Proof. Let {u 1,..., u p } be a basis for R S, where {u 1,..., u p } =, the empty set, in the case R S = {0}. We use Theorem 2.19 to extend {u 1,..., u p } to a basis {u 1,..., u p, r 1,..., r q } for R and a basis {u 1,..., u p, s 1,..., s t } for S. Every x R + S can be written as a linear combination of {u 1,..., u p, r 1,..., r q, s 1,..., s t } so these vectors span R + S. We show that they are linearly independent and hence a basis. Suppose u + r + s = 0, where u := p j=1 α ju j, r := q j=1 ρ jr j, and s := t j=1 σ js j. Now r = (u + s) belongs to both R and to S and hence r R S. Therefore r can be written as a linear combination of u 1,..., u p say r := p j=1 β ju j and at the same time as a linear combination of r 1,..., r q. But then 0 = p j=1 β ju j q j=1 ρ jr j and since {u 1,..., u p, r 1,..., r q } is linearly independent we must have β 1 = = β p = ρ 1 = = ρ q = 0 and hence r = 0. We now have u + s = 0 and by linear independence of {u 1,..., u p, s 1,..., s t } we obtain α 1 = = α p = σ 1 = = σ t = 0. We have shown that the vectors {u 1,..., u p, r 1,..., r q, s 1,..., s t } constitute a basis for R + S. The result now follows from a simple calculation dim(r + S) = p + q + t = (p + q) + (p + t) p = dim(r) + dim(s) dim(r S). From this theorem it follows that dim(r + S) = dim(r) + dim(s) provided R S = {0}. Definition 2.28 (Direct Sum) Let R and S be two subspaces of a vector space (V, F). If R S = {0} then the subspace R + S is called a direct sum and denoted R S. The subspaces R and S are called complementary in the subspace R S. Theorem 2.29 Let R and S be two subspaces of a vector space (V, F) and assume R S = {0}. Every x R S can be decomposed uniquely in the form x = r + s, where r R and s S. If {r 1,..., r k } is a basis for R and {s 1,..., s n } is a basis for S then {r 1,..., r k, s 1,..., s n } is a basis for R S. Proof. To show uniqueness, suppose we could write x = r 1 + s 1 = r 2 + s 2 for r 1, r 2 R and s 1, s 2 S. Then r 1 r 2 = s 2 s 1 and it follows that r 1 r 2 and s 2 s 1 belong to both R and S and hence to R S. But then r 1 r 2 = s 2 s 1 = 0 so r 1 = r 2 and s 2 = s 1. Thus uniqueness follows. Suppose {r 1,..., r k } is a basis for R and {s 1,..., s n } is a basis for S. Since dim(r + S) = dim(r) + dim(s) the vectors {r 1,..., r k, s 1,..., s n } span R + S. To show linear independence suppose k j=1 ρ jr j + n j=1 σ js j = 0. The first sum belongs to R and the second to S and the sum is a decomposition of 0. By uniqueness of the decomposition both sums must be zero. But then ρ 1 = = ρ k = σ 1 = = σ n = 0 and linear independence follows.

12 Chapter 2. Vectors 2.3.2 The quotient space For the sum of two sets we write x + S := {x + s : s S} when one of the sets is a singleton set {x}. Suppose S is a subspace of a vector space (X, F). Since as = S we have The set a(x + S) + b(y + S) = (ax + by) + S, for all a, b F and all x, y S. is a vector space if we define X /S := {x + S : x X } (2.7) a(x + S) + b(y + S) := (ax + by) + S, for all a, b F and all x, y S. The space X /S is called the quotient space of X by S. The zero element in X /S is S itself. Moreover, if x + S = y + S then x y S. Exercise 2.30 Show that X /S is a vector space. Theorem 2.31 Suppose S is a subspace of a finite dimensional vector space (X, F). Then dim(s) + dim ( X /S ) = dim(x ). (2.8) Proof. Let n := dim(x ), k = dim(s), and let {s 1,..., s k } be a basis for S. By Theorem 2.19 we can extend it to a basis {s 1,..., s k, t k+1,..., t n } for X. The result will follow if we can show that {t k+1 + S,..., t n + S} is a basis for X /S. Recall that the zero element in X /S is S. To show linear independence suppose n j=k+1 a j(t j + S) = S for some a k+1,..., a n in F. Since n j=k+1 a js = S and the zero element in X /S is unique we must have n j=k+1 a jt j = 0 which implies that a k+1 = = a n = 0 by linear independence of the t s. It remains to show that span{t k+1 + S,..., t n + S} = X /S. Suppose x + S X /S. For some a 1,..., a n we have x = x 1 + x 2, where x 1 = k j=1 a js j and x 2 = n j=k+1 a jt j. Since x 1 + S = S we have x + S = x 2 + S = n j=k+1 a jt j + S = n j=k+1 a j(t j + S) X /S. 2.4 Norms To measure the size of a vector in a vector space (V, F) we use norms. Definition 2.32 (Norm) A norm in a vector space (V, F), where F = R or F = C, is a function : V R that satisfies for all x, y in V and all a in F 1. x 0 with equality if and only if x = 0. (positivity) 2. ax = a x. (homogeneity) 3. x + y x + y. (subadditivity)

2.4. Norms 13 The triple (V, F, ) is said to be a normed vector space and the inequality 3. is called the triangle inequality. Since x = x y + y x y + y we obtain x y x y. By symmetry x y = y x y x and we obtain the inverse triangle inequality x y x y, x, y V. (2.9) Consider now some specific vector norms. For the vector spaces R n and C n we define for p 1 the p-norms by The most important cases are: x p := ( n x j p) 1/p, (2.10) j=1 1. x 1 = n j=1 x j, (the one-norm or l 1 -norm) x := max 1 j n x j. (2.11) 2. x 2 = ( n j=1 x j 2) 1/2, the two-norm, l2 -norm, or Euclidian norm) 3. x = max 1 j n x j, (the infinity-norm, l -norm, or max norm.) The infinity norm is related to the other p-norms by lim x p = x for all x C n. (2.12) p This clearly holds for x = 0. For x 0 we write x p := x ( n j=1 ( x j x ) p ) 1/p. Now each term in the sum is not greater than one and at least one term is equal to one and we obtain x x p n 1/p x, p 1. (2.13) Since lim p n 1/p = 1 for any n N we see that (2.12) follows. It can be shown (cf. Appendix C) that the p-norm are norms in R n and in C n for any p with 1 p. The triangle inequality x + y p x p + y p is called Minkowski s inequality. To prove it one first establishes Hölder s inequality n x j y j x p y q, j=1 1 p + 1 = 1. (2.14) q The relation 1 p + 1 q = 1 means that if p = 1 then q = and if p = 2 then q = 2. (2.13) shows that the infinity norm and any other p-norm can be bounded in terms of each other. We define

14 Chapter 2. Vectors Definition 2.33 Two norms and in a finite dimensional vector space (V, F) are equivalent if there are positive constants m and M (depending only on the dimension of V) such that for all vectors x V we have The following result is proved in Appendix C. m x x M x. (2.15) Theorem 2.34 All vector norms in a finite dimensional vector space are equivalent. The inverse triangle inequality (2.9) shows that a norm is a continuous function V R. Exercise 2.35 Show that p is a vector norm in R n for p = 1, p =. Exercise 2.36 The set S p = {x R n : x p = 1} is called the unit sphere in R n with respect to p. Draw S p for p = 1, 2, for n = 2. Exercise 2.37 Let 1 p. Produce a vector x l such that x l = x l p and another vector x u such that x u p = n 1/p x u p. Thus the inequalities in (2.12) are sharp. Exercise 2.38 If 1 q p then x p x q n 1/q 1/p x p, x C n. Hint: For the rightmost inequality use Jensen s inequality Cf. Theorem C.2 with f(z) = z p/q and z i = x i q. For the left inequality consider first y i = x i / x, i = 1, 2,..., n. 2.5 Convergence of Vectors Consider an infinite sequence {x k } = x 0, x 1, x 2,... of vectors in R n. This sequence converges to zero if and only if each component sequence x k (j) converges to zero for j = 1,..., n. In terms of the natural basis we have x k = n j=1 x k(j)e k and another way of stating convergence to zero is that in terms of the basis {e 1,..., e n } for R n each coefficient x k (j) of x k converges to zero. Consider now a more general vector space. Definition 2.39 Let {v 1,..., v n } be a basis for a finite dimensional vector space (V, F), where F = R or F = C, and let {x k } be an infinite sequence of vectors in V with basis coefficients {c k }, i.e. x k = n j=1 c kjv j for each k. We say that {x k } converges to zero, or have the limit zero, if lim k c kj = 0 for j = 1,..., n. We say that {x k } converge to the limit x in V if x k x converges to zero. We write this as lim k x k = x or x k x (as k ).

2.5. Convergence of Vectors 15 This definition is actually independent of the basis chosen. If {w 1,..., w n } is another basis for V and x k = n j=1 b kjw j for each k then from Lemma 2.20 b k = Ac k for some nonsingular matrix A independent of k. Hence c k 0 if and only if b k 0. If {a k } and {b k } are sequences of scalars and {x k } and {y k } are sequences of vectors such that {a k } a, {b k } b, {x k } x, and {y k } y then {a k x k + b k y k } ax + by. This shows that scalar multiplication and vector addition are continuous functions with respect to this notion of limit. Corresponding to a basis {v 1,..., v n }, we define x c := max 1 j n c j where x = n c j v j. We leave as an exercise to show that this is a norm on V. Recall that any two norms on V are equivalent. This implies that for any other norm on V there are positive constants α, β such that any x = n j=1 c jv j satisfy x α max 1 j n c j and c j β x for j = 1,..., n. (2.16) Suppose now (V, F, ) is a normed vector space with F = R or F = C. The notion of limit can then be stated in terms of convergence in norm. Theorem 2.40 In a normed vector space we have x k x if and only if lim k x k x = 0. Proof. Suppose {v 1,..., v n } is a basis for the vector space and assume x k, x V. Then x k x = n j=1 c kjv j for some scalars c kj By (2.16) we see that j=1 1 β max k,j c kj x k x α max k,j c kj and hence x k x 0 lim k c kj 0 for each j x k x. Since all vector norms are equivalent we have convergence in any norm we can define on a finite dimensional vector space. Definition 2.41 Let (V, F, ) be a normed vector space and let {x k } in V be an infinite sequence. 1. {x k } is a Cauchy sequence if lim k,l (x k x l ) = 0 or equivalently lim k,l x k x l = 0. More precisely, for each ɛ > 0 there is an integer N N such that for each k, l N we have x k x l ɛ. 2. The normed vector space is said to be complete if every Cauchy sequence converges to a point in the space. 3. {x k } is called bounded if there is a positive number M such that x k M for all k.

16 Chapter 2. Vectors 4. {x nk } is said to be a subsequence of {x k } k 0 if 0 n 0 < n 1 < n 2. Theorem 2.42 In a finite dimensional vector space V the following hold: 1. A sequence in V is convergent if and only if it is a Cauchy sequence. 2. V is complete. 3. Every bounded sequence in V has a convergent subsequence. Proof. 1. Suppose x k x. By the triangle inequality x k x l x k x + x l x and hence x k x l 0. Conversely, let {v 1,..., v n } be a basis for V and {x k } a Cauchy sequence with x k = n j=1 c kjv j for each k. Then x k x l = n j=1 (c kj c lj )v j and since lim k,l (x k x l ) = 0 we have by definition of convergence lim k,l (c kj c lj ) = 0 for j = 1,..., n. Thus for each j we have a Cauchy-sequence {c kj } C and since C is complete {c kj } converges to some c j C. But then x k x := n j=1 c jv j V. 2. V is complete since we just showed that every Cauchy sequence converges to a point in the space. 3. Let {v 1,..., v n } be a basis for V and {x k } be a bounded sequence with x k = n j=1 c kjv j for each k. By (2.16) each coefficient sequence {c kj } k is a bounded sequence of complex numbers and therefore, by a well known property of complex numbers, has a convergent subsequence. In particular the sequence of v 1 coefficients {c k1 } has a convergent subsequence c ki,1. For the second component the sequence {c ki,2} has a convergent subsequence, say c li,2. Continuing with j = 3,..., n we obtain integers 0 m 0 < m 1 < such that {c mi,j} is a convergent subsequence of c kj for j = 1,..., n. But then {x mi } is a convergent subsequence of {x k }. 2.5.1 Convergence of series of vectors Consider now an infinite series m=0 y m of vectors in a vector space (V, F) with F = R or F = C. We say that the series converges if the sequence of partial sums {x k } given by x k = k m=0 y m converges. A sufficient condition for convergence is that m=0 y m converges for some vector norm. We say that the series converges absolutely if this is the case. Note that m=0 y m m=0 y m, and absolute convergence in one norm implies absolute convergence in any norm by Theorem 2.34. In an absolute convergent series we may change the order of the terms without changing the value of the sum. Problems for Section 2.5 Exercise 2.43 Show that if {a k } a, {b k } b, {x k } x, and {y k } y then {a k x k + b k y k } ax + by. Exercise 2.44 Show that c is a norm.

2.6. Inner Products 17 2.6 Inner Products An inner product or scalar product in a vector space (V, F), where F = R or F = C, is a function, mapping pairs of vectors into a scalar. We consider first the case where F = R. Definition 2.45 An inner product in a vector space (V, R) is a function V V R satisfying for all x, y, z V and all a, b R the following conditions: 1. x, x 0 with equality if and only if x = 0. (positivity) 2. x, y = y, x (symmetry) 3. ax + by, z = a x, z + b y, z. (linearity) The triple (V, R,,, ) is called a real inner product space The standard inner product in V = R n is given by x, y := x T y. It is clearly an inner product in R n. When the field of scalars is C the inner product is complex valued and properties 2. and 3. are altered as follows: Definition 2.46 An inner product in a vector space (V, C) is a function V V C satisfying for all x, y, z V and all a, b C the following conditions: 1. x, x 0 with equality if and only if x = 0. (positivity) 2. x, y = y, x (skew symmetry) 3. ax + by, z = a x, z + b y, z. (linearity) The triple (V, C,,, ) is called a complex inner product space Note the complex conjugate in 2. and that (Cf. Exercise 2.52) x, ay + bz = a x, y + b x, z. (2.17) The standard inner product in C n is given by x, y := x H y = n j=1 x jy j. It is clearly an inner product in C n. Suppose now (V, F,,, ) is an inner product space with F = R or F = C. We define the inner product norm by x := x, x, x V. For any vectors x, y V and scalar a F we have (Cf. Exercises 2.51 and 2.52) by linearity and symmetry the expansion x + ay 2 = x 2 + 2a x, y + a 2 y 2 (real case), (2.18) = x 2 + 2Re x, ay + a 2 y 2 (complex case), (2.19) where Re z and Im z denotes the real- and imaginary part of the complex number z.

18 Chapter 2. Vectors In the complex case we can write the inner product of two vectors as a sum of inner product norms. For any x, y V it follows from (2.19) that 4 x, y = x + y 2 x y 2 + i x iy 2 i x + iy 2, (2.20) where i = 1 and we used that Im(z) = Re( iz) for any z C. To show that the inner product norm is a norm in (V, R) we need the triangle inequality. To show it we start with a famous inequality. Theorem 2.47 (Cauchy-Schwarz inequality) For any x, y in a real or complex inner product space x, y x y with equality if and only if x and y are linearly dependent. Proof. The inequality is trivial if x, y = 0 so assume x, y = 0. Suppose first x, y R. We define the scalar a := x,y y, and use (2.18) to obtain 0 2 x + ay 2 = x 2 ( x, y ) 2 / y 2. Thus the inequality follows in the real case. Suppose next x, y is complex valued, say x, y = re iφ. We define b := e iφ and observe that b x, y = r is real valued and b = 1. Using the real case of the Cauchy-Schwarz inequality we find x, y = b x, y = bx, y bx y = x y which proves the inequality also in the complex case. We have equality if and only if x + ay = 0 which means that x and y are linearly dependent. Theorem 2.48 (Triangle Inequality) For any x, y in a real or complex inner product space x + y x + y. Proof. From the Cauchy-Schwarz inequality it follows that Re x, y x y. Using this on the inner product term in (2.19) with a = 1 we get x + y 2 x 2 + 2 x y + y 2 = ( x + y ) 2. Taking square roots completes the proof. Theorem 2.49 (Parallelogram Identity) For all x, y in a real or complex inner product space x + y 2 + x y 2 = 2 x 2 + 2 y 2. Proof. We set a = ±1 in the inner product expansion (2.19) and add the two equations.

2.7. Orthogonality 19 In the real case the Cauchy-Schwarz inequality implies that 1 x,y x y 1 for nonzero x and y so there is a unique angle θ in [0, π] such that cos θ = x, y x y. (2.21) This defines the angle between vectors in a real inner product space. Exercise 2.50 Suppose A R m,n has linearly independent columns. Show that x, y := x T A T Ay defines an inner product on R n. F Exercise 2.51 Show (2.18) Exercise 2.52 Show (2.17) and (2.19). Exercise 2.53 Show (2.20) Exercise 2.54 Show that in the complex case there is a unique angle θ in [0, π/2] such that x, y cos θ = x y. (2.22) 2.7 Orthogonality As in the previous section we assume that (V, F,,, ) is an inner product space with F = R or F = C. Also denotes the inner product norm. Definition 2.55 (Orthogonality) Two vectors x, y in a real or complex inner product space are called orthogonal or perpendicular, denoted as x y, if x, y = 0. The vectors are orthonormal if in addition x = y = 1. For orthogonal vectors it follows from (2.19) that the Pythagorean theorem holds x + y 2 = x 2 + y 2, if x y. Definition 2.56 (Orthogonal- and Orthonormal Bases) A set of vectors {v 1,..., v k } in a subspace S of a real or complex inner product space is called an orthogonal basis for S if it is a basis for S and v i, v j = 0 for i j. It is an orthonormal basis for S if it is a basis for S and v i, v j = δ ij for all i, j. A basis for an inner product space can be turned into an orthogonal- or orthonormal basis for the subspace by the following construction.