Jim Lambers MAT 610 Summer Session Lecture 1 Notes

Jim Lambers MAT 60 Summer Session 2009-0 Lecture Notes Introduction This course is about numerical linear algebra, which is the study of the approximate solution of fundamental problems from linear algebra by numerical methods that can be implemented on a computer We begin with a brief discussion of the problems that will be discussed in this course, and then establish some fundamental concepts that will serve as the foundation of our discussion of various numerical methods for solving these problems Systems of Linear Equations One of the most fundamental problems in computational mathematics is to solve a system of n linear equations a x + a 2 x 2 + + a n x n = b a 2 x + a 22 x 2 + + a 2n x n = b 2 a m x + a m2 x 2 + + a mn x n = b m for the n unknowns x, x 2,, x n Many data fitting problems, such as polynomial interpolation or least-squares approximation, involve the solution of such a system Discretization of partial differential equations often yields systems of linear equations that must be solved Systems of nonlinear equations are typically solved using iterative methods that solve a system of linear equations during each iteration In this course, we will study the solution of this type of problem in detail The Eigenvalue Problem As we will see, a system of linear equations can be described in terms of a single vector equation, in which the product of a known matrix of the coefficients a ij, and an unknown vector containing the unknowns x j, is equated to a known vector containing the right-hand side values b i Unlike multiplication of numbers, multiplication of a matrix and a vector is an operation that is difficult to understand intuitively However, given a matrix A, there are certain vectors for which multiplication by A is equivalent to ordinary multiplication This leads to the eigenvalue problem, which consists of finding a nonzero vector x and a number λ such that Ax = λx

The number λ is called an eigenvalue of A, and the vector x is called an eigenvector corresponding to λ This is another problem whose solution we will discuss in this course Fundamental Concepts, Algorithms, and Notation Writing a system of equations can be quite tedious Therefore, we instead represent a system of linear equations using a matrix, which is an array of elements, or entries We say that a matrix A is m n if it has m rows and n columns, and we denote the element in row i and column j by a ij We also denote the matrix A by [a ij With this notation, a general system of m equations with n unknowns can be represented using a matrix A that contains the coefficients of the equations, a vectors x that contains the unknowns, and a vector b that contains the quantities on the right-hand sides of the equations Specifically, A = a a 2 a n a 2 a 22 a 2n a m a m2 a mn, x = Note that the vectors x and b are represented by column vectors Vector Spaces and Linear Transformations x x 2 x n, b = Matrices are much more than notational conveniences for writing systems of linear equations A matrix A can also be used to represent a linear function f A whose domain and range are both sets of vectors called vector spaces A vector space over a field (such as the field of real or complex numbers) is a set of vectors, together with two operations: addition of vectors, and multiplication of a vector by a scalar from the field Specifically, if u and v are vectors belonging to a vector space V over a field F, then the sum of u and v, denoted by u + v, is a vector in V, and the scalar product of u with a scalar α in F, denoted by αu, is also a vector in V These operations have the following properties: Commutativity: For any vectors u and v in V, u + v = v + u Associativity: For any vectors u, v and w in V, (u + v) + w = u + (v + w) Identity element for vector addition: There is a vector 0, known as the zero vector, such that for any vector u in V, u + 0 = 0 + u = u b b 2 b m 2

Additive inverse: For any vector u in V, there is a unique vector u in V such that u + ( u) = u + u = 0 Distributivity over vector addition: For any vectors u and v in V, and scalar α in F, α(u + v) = αu + αv Distributivity over scalar multiplication: For any vector u in V, and scalars α and β in F, (α + β)u = αu + βu Associativity of scalar multiplication: For any vector u in V and any scalars α and β in F, α(βu) = (αβ)u Identity element for scalar multiplication: For any vector u in V, u = u A function f A : V W, whose domain V and range W are vector spaces over a field F, is a linear transformation if it has the properties f A (x + y) = f A (x) + f A (y), f A (αx) = αf A (x), where x and y are vectors in V and α is a scalar from F Subspaces and Bases Before we can explain how matrices can be used to easily describe linear transformations, we must introduce some important concepts related to vector spaces A subspace of a vector space V is a subset of V that is, itself, a vector space In particular, a subset S of V is also a subsapce if it is closed under the operations of vector addition and scalar multiplication That is, if u and v are vectors in S, then the vectors u + v and αu, where α is any scalar, must also be in S In particular, S cannot be a subspace unless it includes the zero vector Often a vector space or subspace can be characterized as the set of all vectors that can be obtained by adding and/or scaling members of a given set of specific vectors For example, R 2 can be described as the set of all vectors that can be obtained by adding and/or scaling the vectors e = [ 0 [ 0, e 2 = 3

These vectors comprise what is known as the standard basis of R 2 More generally, given a set of vectors {v, v 2,, v k } from a vector space V, a vector v V is called a linear combination of v, v 2,, v k if there exist constants c, c 2,, c k such that v = c v + c 2 v 2 + + c k v k = k c i v i i= We then define the span of {v, v 2,, v k }, denoted by span {v, v 2,, v k }, to be the set of all linear combinations of v, v 2,, v k From the definition of a linear combination, it follows that this set is a subspace of V When a subspace is defined as the span of a set of vectors, it is helpful to know whether the set includes any vectors that are, in some sense, redundant, for if this is the case, the description of the subspace can be simplified To that end, we say that a set of vectors {v, v 2,, v k } is linearly independent if the equation c v + c 2 v 2 + + c k v k = 0 holds if and only if c = c 2 = = c k = 0 Otherwise, we say that the set is linearly dependent If the set is linearly independent, then any vector v in the span of the set is a unique linear combination of members of the set; that is, there is only one way to choose the coefficients of a linear combination that is used to obtain v Given a vector space V, if there exists a set of vectors {v, v 2,, v k } such that V is the span of {v, v 2,, v k }, and {v, v 2,, v k } is linearly independent, then we say that {v, v 2,, v k } is a basis of V Any basis of V must have the same number of elements, k We call this number the dimension of V, which is denoted by dim(v ) Matrix Representation of Linear Transformations If V is a vector space of dimension n over a field, such as R n or C n, and W is a vector space of dimension m, then a linear function f A with domain V and range W can be represented by an m n matrix A whose entries belong to the field Suppose that the set of vectors {v, v 2,, v n } is a basis for V, and the set {w, w 2,, w m } is a basis for W Then, a ij is the scalar by which w i is multiplied when applying the function f A to the vector v j That is, f A (v j ) = a j w + a 2j w 2 + + a mj w m = m a ij w i i= In other words, the jth column of A describes the image under f A of the vector v j, in terms of the coefficients of f A (v j ) in the basis {w, w 2,, w m } If V and W are spaces of real or complex vectors, then, by convention, the bases {v j } n j= and {w i } m i= are each chosen to be the standard basis for Rn and R m, respectively The jth vector in 4

the standard basis is a vector whose components are all zero, except for the jth component, which is equal to one These vectors are called the standard basis vectors of an n-dimensional space of real or complex vectors, and are denoted by e j From this point on, we will generally assume that V is R n, and that the field is R, for simplicity Example The standard basis for R 3 consists of the vectors e = 0, 0 e 2 =, e 3 = 0 0 To describe the action of A on a general vector x from V, we can write x = x e + x 2 e 2 + + x n e n = Then, because A represents a linear function, f A (x) = x j f A (e j ) = j= 0 0 x j e j j= x j a j, where a j is the jth column of A We define the vector y = f A (x) above to be the matrix-vector product of A and x, which we denote by Ax Each element of the vector y = Ax is given by j= y i = [Ax i = a i x + a i2 x 2 + + a in x n = a ij x j From this definition, we see that the jth column of A is equal to the matrix-vector product Ae j Example Let Then A = Ax = 0 3 5 3 0 4 2 5 3 + 0 4, x = + 2 2 3 0 2 j= = We see that Ax is a linear combination of the columns of A, with the coefficients of the linear combination obtained from the components of x 5 8 0 25

Matrix Multiplication It follows from this definition that a general system of m linear equations in n unknowns can be described in matrix-vector form by the equation Ax = b, where Ax is a matrix-vector product of the m n coefficient matrix A and the vector of unknowns x, and b is the vector of right-hand side values Of course, if m = n =, the system of equations Ax = b reduces to the scalar linear equation ax = b, which has the solution x = a b, provided that a = 0 As a is the unique number such that a a = aa =, it is desirable to generalize the concepts of multiplication and identity element to square matrices, for which m = n The matrix-vector product can be used to define the composition of linear functions represented by matrices Let A be an m n matrix, and let B be an n p matrix Then, if x is a vector of length p, and y = Bx, then we have where C is an m p matrix with entries Ay = A(Bx) = (AB)x = Cx, C ij = a ik b kj k= We define the matrix product of A and B to be the matrix C = AB with entries defined in this manner It should be noted that the product BA is not defined, unless m = p Even if this is the case, in general, AB = BA That is, matrix multiplication is not commutative However, matrix multiplication is associative, meaning that if A is m n, B is n p, and C is p k, then A(BC) = (AB)C Example Consider the 2 2 matrices [ 2 A = 3 4 [ 5 6, B = 7 8 Then AB = [ 2 3 4 [ 5 6 7 8 [ ( 5) 2(7) (6) 2( 8) = 3( 5) + 4(7) 3(6) + 4( 8) = [ 9 22 43 50, whereas BA = [ 5 6 7 8 We see that AB = BA [ 2 3 4 = [ 5() + 6( 3) 5( 2) + 6(4) 7() 8( 3) 7( 2) 8(4) = [ 23 34 3 46 6

Vector Operations for Matrices The set of all matrices of size m n, for fixed m and n, is itself a vector space of dimension mn The operations of vector addition and scalar multiplication for matrices are defined as follows: If A and B are m n matrices, then the sum of A and B, denoted by A + B, is the m n matrix C with entries c ij = a ij + b ij If α is a scalar, then the product of α and an m n matrix A, denoted by αa, is the m n matrix B with entries b ij = αa ij It is natural to identify m n matrices with vectors of length mn, in the context of these operations Matrix addition and scalar multiplication have properties analogous to those of vector addition and scalar multiplication In addition, matrix multiplication has the following properties related to these operations We assume that A is an m n matrix, B and D are n k matrices, and α is a scalar Distributivity: A(B + D) = AB + AD Commutativity of scalar multiplication: α(ab) = (αa)b = A(αB) The Transpose of a Matrix An n n matrix A is said to be symmetric if a ij = a ji for i, j =, 2,, n The n n matrix B whose entries are defined by b ij = a ji is called the transpose of A, which we denote by A T Therefore, A is symmetric if A = A T More generally, if A is an m n matrix, then A T is the n n matrix B whose entries are defined by b ij = a ji The transpose has the following properties: (A T ) T = A (A + B) T = A T + B T (AB) T = B T A T Example Let A be the matrix from a previous example, 3 0 A = 4 2 5 3 Then A T = 3 5 0 4 2 3 7

It follows that A + A T = 3 + 3 0 + + 5 + 0 4 4 2 + 5 + 2 3 3 = 6 4 8 3 4 3 6 This matrix is symmetric This can also be seen by the properties of the transpose, since (A + A T ) T = A T + (A T ) T = A T + A = A + A T Other Fundamental Matrix Computations We now define several other operations on matrices and vectors that will be useful in our study of numerical linear algebra For simplicity, we work with real vectors and matrices Given two vectors x and y in R n, the dot product, or inner product, of x and y is the scalar x T y = x y + x 2 y 2 + + x n y n = x i y i, where x = x x 2 x n, y = Note that x and y must both be defined to be column vectors, and they must have the same length If x T y = 0, then we say that x and y are orthogonal Let x R m and y R n, where m and n are not necessarily equal The term inner product suggests the existence of another operation called the outer product, which is defined by xy T = y y 2 y n i= x y x y 2 x y n x 2 y x 2 y 2 x 2 y n x m y x m y 2 x m y n Note that whereas the inner product is a scalar, the outer product is an m n matrix If x, y R n, the Hadamard product, or componentwise product, of x and y, denoted by x y or x y, is the vector z obtained by multiplying corresponding components of x and y That is, if z = x y, then z i = x i y i, for i =, 2,, n Example If x = [ 2 T [ T and y = 3 4, then [ x T y = ( 3) + ( 2)4 =, xy T ( 3) (4) = 2( 3) 2(4) = [ 3 4 6 8, 8

and x y = [ ( 3) 2(4) = [ 3 8 It is useful to describe matrices as collections of row or column vectors Specifically, a row partition of an m n matrix A is a description of A as a stack of row vectors r T, rt 2,, rt m That is, r T r T A = 2 On the other hand, we can view A is a concatenation of column vectors c, c 2,, c n : r T m A = [ c c 2 c n This description of A is called a column partition Understanding Matrix-Matrix Multiplication The fundamental operation of matrix-matrix multiplication can be understood in three different ways, based on other operations that can be performed on matrices and vectors Let A be an m n matrix, and B be an n p matrix, in which case C = AB is an m p matrix We can then view the computation of C in the following ways: Dot product: each entry c ij is the dot product of the ith row of A and the jth column of B Matrix-vector multiplication: the jth column of C is a linear combination of the columns of A, where the coefficients are obtained from the jth column of B That is, if C = [ c c 2 c p, B = [ b b 2 b p are column partitions of C and B, then c j = Ab j, for j =, 2,, p Outer product: given the partitions we can write A = [ a a 2 a n, B = C = a b T + a 2 b T 2 + + a n b T n = That is, C is a sum of outer product updates b T b T 2 b T n, a i b T i i= 9

Other Essential Ideas from Linear Algebra We now introduce other important concepts from linear algebra that will prove to be useful in our analysis of algorithms from numerical linear algebra Special Subspaces Let A be an m n matrix Then the range of A, denoted by ran(a), is the set of all vectors of the form y = Ax, where x R n It follows that ran(a) is the span of the columns of A, which is also called the column space of A The dimension of ran(a) is called the column rank of A Similarly, the dimension of the row space of A is called the row rank of A, which is also the column rank of A T It can be shown that the row rank and column rank are equal; this common value is simply called the rank of A, and is denoted by rank(a) We say that A is rank-deficient if rank(a) < min{m, n}; otherwise, we say that A has full rank It is interesting to note that any outer product of vectors has rank one The null space of A, denoted by null(a), is the set of all vectors x R n such that Ax = 0 Its dimension is called the nullity of A It can be shown that for an m n matrix A, The Identity Matrix dim(null(a)) + rank(a) = n When n =, the identity element of matrices, the number, is the unique number such that a() = (a) = a for any number a To determine the identity element for n n matrices, we seek a matrix I such that AI = IA = A for any n n matrix A That is, we must have a ik I kj = a ij, k= i, j =,, n This can only be guaranteed for any matrix A if I jj = for j =, 2,, n, and I ij = 0 when i = j We call this matrix the identity matrix 0 0 0 0 I = 0 0 0 0 Note that the jth column of I is the standard basis vector e j 0

The Inverse of a Matrix Given an n n matrix A, it is now natural to ask whether it is possible to find an n n matrix B such that AB = BA = I Such a matrix, if it exists, would then serve as the inverse of A, in the sense of matrix multiplication We denote this matrix by A, just as we denote the multiplicative inverse of a nonzero number a by a If the inverse of A exists, we say that A is invertible or nonsingular; otherwise, we say that A is singular If A exists, then we can use it to describe the solution of the system of linear equations Ax = b, for A Ax = (A A)x = Ix = x = A b, which generalizes the solution x = a b of a single linear equation in one unknown However, just as we can use the inverse to describe the solution to a system of linear equations, we can use systems of linear equations to characterize the inverse Because A satisfies AA = I, it follows from multiplication of both sides of this equation by the jth standard basis vector e j that Ab j = e j, j =, 2,, n, where b j = A e j is the jth column of B = A That is, we can compute A by solving n systems of linear equations of the form Ab j = e j, using a method such as Gaussian elimination and back substitution If Gaussian elimination fails due to the inability to obtain a nonzero pivot element for each column, then A does not exist, and we conclude that A is singular The inverse of a nonsingular matrix A has the following properties: A is unique A is nonsingular, and (A ) = A If B is also a nonsingular n n matrix, then (AB) = B A (A ) T = (A T ) It is common practice to denote the transpose of A by A T Because the set of all n n matrices has an identity element, matrix multiplication is associative, and each nonsingular n n matrix has a unique inverse with respect to matrix multiplication that is also an n n nonsingular matrix, this set forms a group, which is denoted by GL(n), the general linear group The Sherman-Morrison-Woodbury Formula Suppose a matrix B has the form B = A + uv T,

where A is nonsingular u and v are given vectors This modification of A to obtain B is called a rank-one update, since uv T, an outer product, has rank one, due to every vector in the range of uv T being a scalar multiple of u To obtain B from A, we note that if then which yields Ax = u, Bx = (A + uv T )x = ( + v T x)u, B u = + v T A u A u On the other hand, if x is such that v T A x = 0, then which yields BA x = (A + uv T )A x = x, B x = A x This takes us to the following more general problem: given a matrix C, we wish to construct a matrix D such that the following conditions are satisfied: Dw = z, for given vectors w and z Dy = Cy, if y is orthogonal to a given vector g In our application, C = A, D = B, w = u, z = /( + v T A u)a u, and g = A T v To solve this problem, we set (z Cw)gT D = C + g T w Then, if g T y = 0, the second term in the definition of D vanishes, and we obtain Dy = Cy, but in computing Dw, we obtain factors of g T w in the numerator and denominator that cancel, which yields Dw = Cw + (z Cw) = z Applying this definition of D, we obtain ( ) B = A +v + T A u A u A u v T A v T A u = A A uv T A + v T A u This formula for the inverse of a rank-one update is known as the Sherman-Morrison Formula It is a special case of the Sherman-Morrison-Woodbury Formula, (A + UV T ) = A A U(I + V T A U) V T A, where U and V are n k matrices, which means that A + UV T is a rank-k update of A 2

The Determinant of a Matrix We previously learned that a 2 2 matrix A is invertible if and only if the quantity a a 22 a 2 a 2 is nonzero This generalizes the fact that a matrix a is invertible if and only if its single entry, a = a, is nonzero We now discuss the generalization of this determination of invertibility to general square matrices The determinant of an n n matrix A, denoted by det(a) or A, is defined as follows: If n =, then det(a) = a If n >, then det(a) is recursively defined by det(a) = a ij ( ) i+j det(m ij ), j= i n, where M ij, called a minor of A, is the matrix obtained by removing row i and column j of A Alternatively, det(a) = a ij ( ) i+j det(m ij ), i= j n The matrix A ij = ( ) i+j M ij is called a cofactor of A This definition of the determinant, however, does not lead directly to a practical algorithm for its computation, because it requires O(n!) floating-point operations, whereas typical algorithms for matrix computations run in polynomial time However, the computational effort can be reduced by choosing from the multiple formulas for det(a) above By consistently choosing the row or column with the most zeros, the number of operations can be minimized However, more practical methods for computing the determinant can be obtained by using its properties: If any row or column of A has only zero entries, then det(a) = 0 If any two rows or columns of A are the same, then det(a) = 0 If B is an n n matrix, then det(ab) = det(a) det(b) det(a T ) = det(a) If A is nonsingular, then det(a ) = (det(a)) The best-known application of the determinant is the fact that it indicates whether a matrix A is nonsingular, or invertible The following statements are all equivalent det(a) = 0 3

A is nonsingular A exists The system Ax = b has a unique solution for any n-vector b The system Ax = 0 has only the trivial solution x = 0 The determinant has other interesting applications The determinant of a 3 3 matrix is equal to the volume of a parallelepiped defined by the vectors that are the rows (or columns) of the matrix This is a special case of the fact that the determinant of an n n matrix is equal to the product of its eigenvalues Differentiation of Matrices Suppose that A(t) is an m n matrix in which each entry is a function of a parameter t Then the matrix A (t), or da/dt, is the m n matrix obtained by differentiating each entry with respect to t That is, [ da(t) = d dt dt [a ij(t) ij Matrices obey differentiation rules that are analogous to differentiation rules for functions, but the rules of matrix-matrix multiplication must be taken into account For example, if A(t) is an m n matrix and B is an n p matrix, then and if A(t) is a nonsingular n n matrix, then d dt [A(t)B(t) = d dt [A(t)B(t) + A(t) d dt [B(t), d dt [A(t) = A(t) d dt [A(t)A(t) It is also useful to know how to differentiate functions of vectors with respect to their components Let A be an n n matrix and x R n Then, we have ( x T Ax ) = (A + A T )x, (x T b) = b These formulas are useful in problems involving minimization of functions of x, such as the leastsquares problem, which entails approximately solving a system of m equations in n unknowns, where typically m > n 4