Basic Elements of Linear Algebra

A Basic Review of Linear Algebra Nick West nickwest@stanfordedu September 16, 2010 Part I Basic Elements of Linear Algebra Although the subject of linear algebra is much broader than just vectors and matrices, most people s first experience and day-to-day interaction with the subject is through these objects Furthermore, and perhaps, a stronger argument as to why matrices and vectors are a fundamental place to start such a review is that much of modern computation is done with matrices and vectors, either as a direct representation of a model problem or through some type of discretization Thus, a strong background and familiarity with these objects is necessary to successfully solve many problems 1 Scalars The most basic element in linear algebra in the scalar; scalars are just a number, and are typically denoted with Greek letters (α, β, γ, ɛ, δ, etc) For the purpose of this review, and of most applications, scalars are taken from the field of real numbers, denoted, R On occasion, people may use complex numbers as the field, C, or indeed any other field 2 Vectors Vectors are a finite, ordered set of numbers (of the same field as the scalars we are considering) These n numbers are indexed, and may be denoted, x = {ν i },,n ; vectors are typically denoted with lower case Roman letters (a,b,x,y,z, etc) and are typically referred to as elements in R n (eg x R n or y R m ) There are two orientations of vectors, the more common, column vector, and the row vector ν 1 ν 2 x = ν n x = [ ν 1 ν 2 ν n ] ; both are elements of R n Unless otherwise stated, all vectors as assumed to be column vectors The transpose of a vector x, denoted x T, changes a vector from a column vector to a row vector (or vice versa) There are many examples of vectors One common example is as a point in R 2, where ν 1 is the x coordinate and ν 2 is the y coordinate Vectors can also denote a direction and magnitude (see below), such as a force Another common example of a vector is as a set of probabilities; here each component, say π i, gives 1

the probability of the ith event Typically the components of these vectors sum to 1 (and are non-negative); in such cases these vectors are called stochastic 21 Arithmetic Operations with Vectors 211 The Scalar Product Let α R and x = {ν i : 1 i n} R n, then the scalar product is and is also in R n 212 Vector Addition αx = {αν i : 1 i n} Let a = {α i : 1 i n} and b = {β i : 1 i n} be vectors in R n The vector sum is defined as α 1 β 1 α 1 + β 1 α 2 a + b = {α i + β i : 1 i n} = + β 2 = α 2 + β 2 α n β n α n + β n The vector a + b is in R n as well Thus R n is closed under addition and scalar multiplication 213 The Dot Product Let a and b be two column vectors in R n The dot product, denoted which is a scalar 22 The Vector Norm a b = a T b = n α i β i A norm is a mapping f : R n R + (here R + is the set of non-negative real numbers) with the following properties: 1 f(x) = 0 if and only if x = 0 2 f(αx) = α f(x) for α R and x R n 3 f(x + y) f(x) + f(y) for x, y R A norm f(x) is often denoted by x Let x = {ν i : 0 i n} There are three commonly used vector norms in linear algebra: 1 The 2-Norm: This is probably the most commonly used norm f(x) = x 2 = n νi 2 2

2 The 1-Norm: f(x) = x 1 = n ν i This is sometimes referred to the as the Manhattan Norm as it can be viewed as counting the number of blocks you would have to walk on a n-dimensional grid 3 The -Norm: This gives the component with largest magnitude f(x) = x = max 1 i n ν i In all cases, the norm is a measure of length; the 2-Norm corresponds to the Euclidean length of the vector 23 Inequalities Let a = {α i : 1 i n} and b = {β i : 1 i n} be elements in R n We say The Cauchy-Schwartz Inequality states that 24 Special Vectors a b if and only if α i β i for all 1 i n x T y x 2 y 2 There are two special notations used commonly in linear algebra The first is the vector e = {ɛ i = 1 : 1 i n}, that is the vector of all ones: 1 1 e = 1 The second is the vector e i which is vector of all zeros, but with a 1 in the ith position e i = {ɛ j : 1 j n}, where { 1 i = j ɛ j = 0 i j 3 Matrices Matrices are two dimensional objects and are a collection of mn numbers; they are typically denoted by capitol letters (A, B, Q, M, etc) and are referred to as elements of R m n They are often specified in the following way: A = {α ij : 1 i m, 1 j n}: a 11 a 12 a 1n a 21 a 22 a 2n A = a m1 a m2 a mn m is the number of rows; n is the number of columns Alternatively, matrices may also be thought of as a collection of n column vectors {a 1, a 2,, a n } where each a i R m : A = a 1 a 2 a n ; 3

or a collection of rows vectors {b 1, b 2,, b m } where each b i R n : b 1 b 2 A = b m Examples of matrices can be found throughout engineering As we will see shortly, matrices are a concise way to express linear systems or linear constraints Finite difference approximations can often be conveniently represented in matrices as well as transition probabilities for many classes of discrete stochastic processes A more concrete example of a matrix is the 2 x 2 rotation matrix; suppose we wish to rotate a point in R 2 θ degrees about the origin The matrix cos θ sin θ Q = sin θ cos θ when applied to a vector in R 2 will rotate the vector by θ degrees Finally, vectors themselves are special cases of matrices: column vectors, in R n, are matrices in R n 1 ; row vectors are matrices in R 1 n 31 The Matrix Transpose Although the transpose of a matrix has much deeper mathematical meaning that what we will get into here, it can be convenient to introduce it here, as a symbolic manipulation Let A = {α ij : 1 i m, 1 j n} Then transpose of the matrix, denoted A T = B = {β ij : 1 i n, 1 j m} where the elements of B are given by β ij = α ji Thus if A R m n then B R n m The transpose has the following properties: (A T ) T = A (A + B) T = A T + B T (AB) T = B T A T (αa) T = αa T 32 Arithmetic Operations with Matrices Let γ R and A = {α ij : 1 i m, 1 j n} and B = {β ij : 1 i n, 1 j n} be elements of R m n 321 Scalar Multiplication Multiplying a matrix A by the scalar γ scales the elements of A by γ: 322 Matrix Addition γa = {γα ij : 1 i m, 1 j n} For two matrices in the same space (R m n ), like A and B defined above, matrix addition is defined by A + B = {α ij + β ij : 1 i m, 1 j n}, or α 11 α 12 α 1n β 11 β 12 β 1n α 11 + β 11 α 12 + β 12 α 1n + β 1n α 21 α 22 α 2n A+B = + β 21 β 22 β 2n = α 21 + β 21 α 22 + β 22 α 2n + β 2n α m1 α m2 α mn β m1 β m2 β mn α m1 + β m1 α m2 + β m2 α mn + β mn 4

323 Matrix-Matrix Products Let A = {α ij : 1 i m, 1 j n} be in R m n and B = {β ij : 1 i p, 1 j m} be in R p m The matrix product between an matrix in R p m and a matrix in R m n, denoted B A = C = {γ ij : 1 i p, 1 j n}, has elements given by: γ ij = m β ik α kj, 1 i p, 1 j n, k=1 or γ ij is the dot product of the ith row of B with the jth column of A C is an element of R p n The number of columns of B is called its inner dimension and the number of rows of A is called its inner dimension; the matrix product is only defined when the inner dimension is the same 324 The Outer Product Let x R n and y R m be column vectors Their outer product, yx T = A = {α ij : 1 i m, 1 j n}, is an element of R m n with elements given by α ij = y i x j It can be seen that the outer product is a special case of the matrix-matrix product 325 Matrix-Vector Products The matrix-vector product too can be viewed as a special case of the matrix-matrix product, however it ubiquitous presence in linear algebra makes it worthy of special mention Let A = {α ij : 1 i m, 1 j n} = (a 1 a 2 a n ) = (b 1 b 2 b m ) T be an element in R m n and a i R m and b i R n be column vectors; let x R n be a column vector The vectors {a i } are the columns of A and {b T i } are the rows of A The product Ax = c = {γ i : 1 i m} is a column vector in R m So n γ 1 j=1 α 1jν j n b T γ 2 j=1 α 1 x 2jν j b T 2 x Ax = c = γ m = = n j=1 α mjν j b T mx = ν 1a 1 + ν 2 a 2 + + ν n a n Thus we see the two different interpretations of the matrix-vector product The first says that the components of c, the γ i s, are specific combinations of the components of x, where the weights are given by the elements in ith row of A The second, is that the product Ax is a weighted combination of the columns of A, where weights are given by the components of x 33 Matrix Norms Let A and B be elements of R m n and γ R A norm is a function f : R m n R + with the following properties: 1 f(a) = 0 if and only if A = 0 2 f(γa) = γ f(a) 3 f(a + B) f(a) + f(b) Further more, when n = m, if the norm has the property that f(ab) f(a)f(b), we call the norm submultiplicative All the norms we will consider are sub-multiplicative Like vector norms, matrix norms or often denoted by f(a) = A 5

331 Induced Norms A vector norm induces a matrix norm in the following way: A = sup Ax m x n =1 (where the superscripts on the norms specifies the space the norm is defined on; either R n or R m ) For the three vectors norms discussed above, the matrix norm has an explicit form: 1 The 2-Norm: A 2 = σ 1 (A) where σ 1 (A) is the largest singular value of A (Singular values will be discussed later) 2 The 1-Norm: or the maximal column-sum 3 The -Norm: or the maximal row-sum A 1 = max 1 j n A = max 1 i m j=1 m α ij n α ij 332 Element-wise Norms Several norms are defined directly in terms of the elements of the matrix A The most used of these norms is the Frobenius Norm and is defined as follows m n A F = αij 2 The trace of a square matrix is defined as follows: Thus the Frobenius Norm can be written as (note A T A is square) 34 Structured Matrices tr(a) = j=1 n α ii A F = tr(a T A) There are several types of matrices with special structure that recur in linear algebra The first, which we have already seen, is a square matrix, a matrix with the same number or rows and columns so that m = n A diagonal matrix is a square matrix where the only non-zero entries are the α ii s More specifically, D = {δ ij : 1 i, j n} is diagonal if δ ij = 0 i j, 1 i, j n 6

The identity matrix is a diagonal matrix with δ ii = 1; it is often denoted by I and has the property that Ix = x A symmetric matrix A = {α ij : 1 i, j n} is a square matrix with the property that α ij = α ji Diagonal matrices are symmetric Part II Sets of Vectors A set S R n is a subspace if the following holds if for all x, y S and α R: x + αy S A linear combination of p vectors {s 1, s 2,, s p } is specified by any p real numbers {α 1, α 2,, α p } and is given by α 1 s 1 + α 2 s 2 + + α p s p If the vectors {s i } are made to be the columns of a matrix S = (s 1 s 2 s p ), and the vector a has components α i, then, as we have seen, the linear combination can be expressed as the matrix-vector product Sa Let S = {s 1, s 2,, s p } be a set of p vectors in R n The span of S, denoted span(s) is span(s) = {y = Sx : x R p } is the set of all linear combinations of the vectors in S A set of vectors S = {s 1 s 2 s p } is linearly independent if the only linear combination of S that equals the zero vector is the trivial zero vector In other words, the only way that α 1 s 1 + α 2 s 2 + + α p s p = 0 is if all of the α i s are zero For examples the vectors: [ [ 1 0 e 1 = and e 0] 2 = 1] are linearly independent and span({e 1, s 2 }) = R 2 The dimension, denoted dim(s), of a subspace is the largest number of vectors in the subspace that are linearly independent A basis is any set of linearly independent vectors that spans the subset The number of vectors in a basis for S is dim(s) The basis is not a unique set of vectors Some facts about subspaces: If S R m is a subspace, then its dimension is at most m Every vector in a subspace can be written as a unique linear combination of the basis vectors 7

Part III Linear Functions, Rank, Null Space, Linear Systems and Inverses 4 Linear Functions A function f : A B is said to be linear if for x, y A and α R f(x + αy) = f(x) + αf(y) The function f : R n R m defined by f(x) = Ax where A R m n is a linear function 5 Range Space, Null Space and Rank Recall that for a function f : R n R m to be one-to-one, each x R n must map to a unique y R m Thus for Ax to be one-to-one, it is necessary that n m and that the columns of A be linearly independent If A does not have linearly independent columns, then there is vector z R n such that Az = 0, and thus due to the linearity, there are an infinite number of vectors that map to zero This set of vectors is called the null space of A and is denoted N (A) = {x R n : Ax = 0} Note that due to the linearity of the mapping x Ax, N (A) is a subspace Now consider the range of the function Ax The range R(A) = {Ax : x R n } is the set of vectors mapped to in R m by x Ax For an arbitrary x R n, y = Ax is a linear combination of the columns of A Thus we see that the range of Ax is the span of the columns of A: R(A) = span({a 1, a 2,, a n }) The column-rank is the dimension of R(A) and the row-rank is the size of R(A T ) Observe that the columnrank of a matrix is equal to its row-rank and is called the rank of the matrix A matrix is said to have full rank if rank(a) = min(m, n) The function defined by x Ax is one-to-one if m n and A has full rank This yields the useful property that if A is full rank and m n Ax = Ay x = y (which allows these terms to be canceled from algebraic equations) For the mapping x Ax to be onto, the column-rank must be m (however, the mapping may not be one-to-one if n m) Thus thus a square matrix is full rank if and only if it is both one-to-one and onto Such a matrix is called non-singular A square matrix with rank less than n is called singular Furthermore, since a non-singular matrix is a bijection (both onto and one-to-one) a unique inverse exists (this will be useful later) 51 The Fundamental Theorem of Linear Algebra The fundamental theorem of linear algebra states that for a matrix A R m n, dim(r(a)) + dim(n (A)) = n 8

6 Linear Systems and the Matrix Inverse Linear systems arise from problems of the following form: given a matrix A R m n and a vector b R m, what vector x R n solves: Ax = b? This is a central problem in linear algebra One example of where this problem arises is the following: suppose we have m foods, and n nutrients Food i provides α ij amount of nutrient j per kilogram Our goal is to determine what allocation of food matches a given nutrient prescription (the elements of b, β i, correspond to the amount of nutrient i needed by a person) This can be posed as finding the solution to a linear system of the above form The solution to the above problem need not exist, or may not be unique However, when A is square and n = m, and is of full rank, there is a unique solution to the problem A solution does not exist if b / R(A) (then there is no linear combination of the columns of A equal to b) The solution may be non-unique when the system is not full rank (suppose x is a solution and z is a vector in the null space of A, then (x + δz) is a solution to the problem as well) 61 The Matrix Inverse As stated above, when the system is square (m = n) and full rank, the mapping x Ax is a bijection and thus an inverse is uniquely defined, denoted A 1, and the solution is given by x = A 1 b This inverse satisfies the following property: A 1 A = AA 1 = I The inverse is only defined for square, full-rank matrices Suppose A, B R n n, if the inverse of A and B, A 1 and B 1 respectively, exist, then the inverse of AB is given by (AB) 1 = B 1 A 1 As an example, consider the matrix α β A = γ δ Then, provided αδ βγ 0 (and thus the matrix has full rank) the inverse is given by A 1 1 δ β = αδ βγ γ α 62 A Simple Example Consider A = 1 1, b = 1 1 2 2 and c = (Note that the formula for the matrix inverse doesn t work here) There are an infinite number of solutions to the problem Ax = b and any solution takes the form of x = 2 1 + α 0 1 (Note that ( 1, 1) T is in the null space of A) Secondly, the problem Ax = c 0 1 has no solution as (0, 1) T / R(A) 9

Part IV Inner Products, Orthogonality and Least Squares 7 The Inner Product Let x, y R n and A R n n where A = A T respect to A, of x and y is given by and z T Az 0 for all z 0 Then, the inner product, with x, y A = x T Ay = The inner product has the following properties: x, x A 0 x, x A = 0 if and only if x = 0 x, y + w A = x, y A + x, w A x, αy A = α x, y A x, y A = y, x A n j=1 n x i y j a ij The dot product can be define as the inner product with respect to the identity matrix The inner product can be used define a norm: x A = x, x A and is called the A-norm When A = I, this is just the 2-norm When A = D, a diagonal matrix, the components of D weight the contribution of the components of x to the norm Let D = {δ ij } where δ ij = 0 when i j and δ ii > 0, then the weighted norm is given by: x A = n δ ii x 2 i This is often used in statistics to decrease the effect of noisy or uncertain data points when estimating parameters The angle θ [0, π], between two vectors x and y can be defined by: cos θ = xt y x 2 y 2 This can be thought of as telling us how much of vector x lies in the direction y It projects x onto y 10

8 Orthogonality Two vectors x and y are said to be orthogonal (or A-orthogonal) if x T y = 0 (or x, y A = 0) As set of vectors {u 1,, u m } is called an orthonormal set if u i = 1, for all i, and { u T 1 i = j i u j = 0 i j Every orthonormal set is a basis for the space it spans (and is linearly independent) An example of orthonormal vectors are the columns of the identity matrix A second example in R 2 are the following, 1 2 2 1, 1 1 2 2 A matrix Q is said to be orthogonal if its columns form an orthonormal basis When Q R n n the following property holds: Q T Q = I = QQ T and thus Q T is the inverse of Q Orthogonal matrices are often referred to a reflections or rotations This is because they do not change the Euclidean length of a vector Specifically Qx 2 = x 2 81 The Gram-Schmidt Orthogonalization Process Given a set of linearly independent vectors {y 1, y 2,, y n } the vectors can be used to construct an orthonormal basis for span({y 1,, y n }) The process is as follows: The first vector, q 1, is obtained by normalizing the first vector in the linearly independent set: q 1 = y 1 y 1 2 To form the kth orthogonal vector, from the kth basis vector, project y k onto each of the existing orthogonal vectors ({q 1,, q k 1 }), giving α ki = y k, q i, 1 i k 1 and remove the component of q i from y k by subtracting α ki q i form y k : k 1 k 1 ˆq k = y k α ki q i = y k y k, q i q i The kth orthonormal vector is then computed by normalizing ˆq k : 82 The QR Matrix Factorization q k = ˆq k ˆq k k The Gram Schmidt Orthogonalization process is computationally unstable (after computing a few vectors, the set {q 1,, q n } stops being orthogonal), an alternative is the QR factorization Let A R m n, where m n Then there exists a factorization of A = QR such that Q is an orthogonal matrix, and R is upper triangular (A matrix R = {ρ ij } is upper triangular if ρ ij = 0 when i > j; in other words, all the elements 11

below the diagonal are zero The matrix Q is m m and the first r columns of Q form a basis for R(A) (where r is the rank of A) Thus, to obtain a orthonormal basis for {y 1,, y n }, we can define the matrix A = y 1 y 2 y n and compute the QR factorization of A Then the columns of Q are an orthonormal basis for {y 1, y n } The QR algorithm is implemented in MATLAB as qr To further motivate why a QR factorization is useful, consider the problem of solving the linear system Ax = b where b R(A) and A is a full rank n n matrix Substituting QR for A gives, Ax = b QRx = b Rx = Q T b = ˆb, where the last equality is due to the fact that Q 1 = Q T Now we need only solve the system Rx = ˆb which is easy: Consider the last row of the equation, which reads: Now consider the second to last row, n 1: x 1 x 2 0 0 0 rnn = r nnx n = ˆb n x n = ˆb n r nn x n x 1 x 2 0 0 0 rn 1n 1 r n 1n = r n 1n 1x n 1 + r n 1n x n = ˆb n 1 x n 1 = ˆb n r n 1n x n r n 1n 1 x n However, we know the value x n from the previous row This process can be continued up the matrix, solving row i only after row i + 1; done in this way, there is only one unknown being solved for, and we can easily compute the solution vector x 83 The Orthogonal Complement Let S be a subset of R n Define S, the orthogonal complement of S, as S = {x R n : y T x = 0 y S} S is a subspace of R n (Proof: Suppose x, y S, then take z S, z T (x + αy) = z T x + αz T y = 0 since x and y are, by the definition of S, orthogonal to any z S) Every vector in z R n can be uniquely decomposed into z = x + y with x S and y S Thus the inner product of a vector decomposes as follows: as x and y are orthogonal z T z = (x + y) T (x + y) = x T x + 2x T y + y T y = x T x + y T y 12

9 Least Squares Problems Recall that for an arbitrary matrix A R m n a solution to the problem Ax = b does not necessarily exist; b may not be in R(A) Now suppose we wish to find the x that makes Ax as close to b as possible In the example given at the end of the section on linear systems, we would be looking for an allocation of foods that comes as close as possible to matching the nutrient requirements To formalize this, define r, the residual vector, as follows: r = Ax b and we seek to minimize r 2 Note that the minimizer of r 2 is the same as the minimizer of r 2 2, which is often a more convenient form to work with We now decompose b into the part that lies in R(A), say, b 1, and the part that is in R(A), say b 2 Plugging this in we see that r 2 2 = b 1 Ax + b 2 2 2 + b 1 Ax 2 2 + b 2 2 2 Thus r 2 2 is minimized when Ax = b 1 and the minimal value is b 2 2 2 So the solution is minimal when r = b Ax N (A T ) To see how we might actually solve such a problem, r N (A T ) means that A T r = 0 or A T r = A T (Ax b) = 0 A T Ax = A T b These are called the normal equations When A has full column rank, A T A has full rank and we can solve the above linear system When A doesn t have full column rank, there are an infinite number of solutions to the above problem As with many of the techniques discussed in these notes, the solution of the normal equations is numerically unstable Using the backslash command in MATLAB, will compute the least square solution to min Ax b in a stable way Part V Eigenvectors, Eigenvalues and the SVD 10 Eigenvectors and Eigenvalues Let A be an n n matrix A vector x and a complex number λ are called an eigenvector / eigenvalue pair if x 0 and Ax = λx For example, the matrix [ 1 ] 2 2 1 has 3 as an eigenvalue and the associated eigenvector is [1 1] T : [ [ [ 2 1 1 3 1 = = 3 1 2] 1 3] 1] A given eigenvalue may have more than one eigenvector associated with it 13

From the definition, we see that if λ is an eigenvalue of A with associated eigenvector x that Ax = λx (A λi)x = 0 x N (A λi) The space N (A λi) is called the eigenspace associated with eigenvalue λ It is a subspace, and is denoted E λ The geometric multiplicity of an eigenvalue λ is the size of the associated null space, dim(n (A λi)) This subspace is closed under multiplication with A, so that if x E λ, then Ax E λ This can be generalized to say that if x λ Λ E λ then so is the product Ax (where Λ is a collection of eigenvalues) 101 The Determinant Let A be an n n matrix Define the (n 1) (n 1) matrix Ãij as the matrix A but with row i and column j removed The determinant of a matrix A, det(a), is computed by () α β det = αδ βγ, γ δ for n = 2 and when n > 2, by det(a) = n ( 1) i+j A ij det(ãij) j=1 for any i The determinant of an upper triangular matrix is the product of its diagonal elements (and, as a special case, so is the determinant of a diagonal matrix) The determinant has the following properties: If two columns (or rows) are the same, then det(a) = 0 det(ab) = det(a) det(b) det(a T ) = det(a) det(q) = 1 when Q is an orthogonal matrix det(a 1 ) = det(a) 1, when A 1 exits det(a) = 0 if and only if A is singular Thus if λ is an eigenvalue of A, det(a λi) = 0 As an example, take the matrix: We will take i = 1 Then and Ã 11 = 1 3 2 4 A = 1 0 2 2 1 3 1 2 4 Ã 12 = 2 3 1 4 and Ã 13 = 2 1 1 2 det(ã11) = ( 1) 4 2 4 = 10 det(ã12) = 2 4 3 3 = 5 and det(ã13) = 2 2 ( 1) 1 = 5 So det(a) = 3 ( 1) 1+i A 1j det(ã1j) = ( 1) 1+1 A 11 det(ã11) + ( 1) 1+2 A 12 det(ã12) + ( 1) 1+3 A 13 det(ã13) = 1 1 ( 10) + ( 1) 0 5 + 1 ( 2) 5 = 20 14

102 The Characteristic Polynomial and Computing Eigenvalues The function φ(λ) = det(a λi) is called the characteristic function of the matrix A λ is an eigenvalue of A if φ(λ) = 0, or, in other words, λ is a root of φ(λ) The characteristic function is a polynomial of degree n in λ and thus there are at most n unique eigen values Observe that the roots of this polynomial may be complex numbers; when there are complex roots they appear in conjugate pairs The algebraic multiplicity of λ is the number of times λ is a root φ(λ) The geometric multiplicity is always less than or equal to the algebraic multiplicity To see how this is used to compute eigenvalues, let us return to the matrix at the beginning of this section: 1 2 1 λ 2 A = A λi = 2 1 2 1 λ so that det(a λi) = (1 λ) 2 4 = λ 2 2λ 3 = φ(λ) Setting det(a λi) = 0 we have quadratic equation to solve in order to determine the eigenvalues In this case we see that the eigenvalue are λ 1 = 3 and λ 2 = 1 We can now solve for the eigenvector associated with λ = 1: 1 2 x1 x1 = 1 x 2 1 1 + 2x 2 = x 1 x 1 = x 2, x 2 x 2 where the first implication is from the first row of the equation Plugging this in to the second row we have that 2x 1 + x 2 = x 2 x 2 = x 2 which holds for all x 2, so we take x 2 = 1 and x 1 = 1 Thus the eigenvector associated with lambda = 1 is [1, 1] T Note that the two eigenvectors are orthogonal to each other 103 Diagonalizability Two square matrices A and B are similar if they have the same eigenvalues Suppose W is a non-singular matrix and that B = W AW 1 ; then A and B are similar: Ax = λ x W 1 BW x = λx BW x = λw x so that B has eigenvalue λ and eigenvector W x A matrix is diagonalizable whenever it is similar to a diagonal matrix, so that A = W DW 1 When this is so, the columns of W are the eigenvectors of A and the diagonal elements of D are the eigenvalues The diagonal matrix is often denoted by Λ so that A = W ΛW 1 is the eigenvalue decomposition of A When A is a symmetric matrix (A = A T ) then all of the eigenvalues are real and the eigenvectors are orthonormal This was seen with the example matrix given before In that case, the eigenvectors we computed were orthogonal to each other In this case, the eigenvalue decomposition of A is A = QΛQ T and Q is an orthogonal matrix, and Λ is a diagonal matrix with elements that are the eigenvalues of A If A 1 exists, its eigenvalues are the reciprocals of the eigenvalues of A, and the eigenvectors are the same: A = QΛQ T A 1 = QΛ 1 Q T, (Λ 1 ) ii = 1 Λ ii A 1 exists if and only if all of the eigenvalues of A are non-zero 15

104 Definiteness Let A be a symmetric matrix then If all of the eigenvalues of A are positive, λ i > 0, the matrix is called positive definite This equivalent to x T AX > 0 for all x 0 If all of the eigenvalues of A are non-negative, λ i 0, the matrix is call positive semi-definite If all of the eigenvalues of A are negative, λ i < 0, the matrix is call negative definite This is equivalent to x T Ax < 0 for all x 0 If the eigenvalues of A are of mixed sign, the matrix is called indefinite 105 The Gershgorin Circle Theorem Let A be a square matrix Define and define the sets r i = The eigenvalue of A lie in the union of the sets S i : n j=1j i a ij S i = {x C : a ii x r i } λ(a) n S i As an example, consider the matrix: 1 05 025 A = 05 2 1 025 1 3 Note that A is symmetric so the eigenvalues are real Thus ours sets will be intervals on the real line Then and the sets are defined as r 1 = a 12 + a 13 = 05 + 025 = 075 r 2 = a 21 + a 23 = 0 + 1 = 15 r 3 = a 31 + a 32 = 025 + 1 = 125 S 1 = [a 11 r 1, a 11 + r 1 ] = [1 075, 1 + 075] = [025, 175] S 2 = [a 22 r 2, a 22 + r 2 ] = [2 15, 2 + 15] = [05, 35] S 3 = [a 33 r 3, a 33 + r 3 ] = [3 125, 3 + 125] = [175, 425] The union of these sets, 3 S i = [025, 425] Thus the eigenvalues of A lie in the interval [025, 4, 25] As this region is strictly positive, the eigenvalues of A must be strictly positive and the matrix is therefore positive definite 11 The Singular Value Decomposition Let A R m n be an arbitrary matrix The singular value decomposition or SVD is a generalization of the eigenvalue decomposition to non-square matrices Every matrix A can be decomposed as follows: A = UΣV T 16

where U is an orthogonal m m matrix, V is an orthogonal n n matrix and Σ is diagonal m n matrix The elements of Σ are all zero except for σ ii which are non-negative and are typically denoted σ i The σ i s are called the singular values of A and are ranked so that σ 1 σ 2 σ r > σ r+1 = = σ min(m,n) = 0 The rank of A is the number of non-zero singular values and R(A) = span({u 1,, u r }) and N (A) = span({v r+1,, v n }) For a symmetric matrix, the singular values are the absolute values of the eigenvalues, and in general, are related to the eigenvalues of A T A by σ 2 i (A) = λ i (A T A) To see this, observe that A T A is symmetric and thus has an eigenvalue decomposition QΛQ T A T A is also equal to U T Σ T V V T Σ T U = U T Σ T ΣU and that Σ T Σ is a diagonal matrix with entries given by σi 2 Thus since the eigenvalue decomposition is unique, σi 2 = λ i The SVD decomposes the matrix into a rotaions, followed by a scaling and then another rotation 17