Linear Algebra Carleton DeTar detar@physics.utah.edu February 27, 2017 This document provides some background for various course topics in linear algebra: solving linear systems, determinants, and finding eigenvalues and eigenvectors. 1 Gaussian elimination Gaussian elimination is a systematic strategy for solving a set of linear equations. It can also be used to construct the inverse of a matrix and to factor a matrix into the product of lower and upper triangular matrices. We start by solving the linear system + 2 = 2 + + = 3 + + = 4 Basically, the objective of Gaussian elimination is to do transformations on the equations that do not change the solution, but systematically zero out (eliminate) the off-diagonal coefficients, leaving a set of equations from which we can read off the answers. We express the problem in terms of a set of equations, and side-by-side, we express it in terms of an equivalent matrix product. We do this to show how the manipulations in the matrix tracks the manipulations of the equations, where it is easier to see that we are not changing the solution. The method has two parts. First triangulation and then back substitution. 1.1 Triangulation 1
Equations Starting equation + 2 + 0 = 2 + = 3 + + = 4 First step: examine the coefficients of. Swap equations (1) and (3) so the largest coefficient is in the first equation (first row). This is called the pivot element. 3 + + = 4 2 + = + 2 + 0 = Matrix-vector representation Equivalent matrix-vector equation 1 2 0 2 1 = 3 1 1 4 Swap the first the third rows of the matrix and the first and third elements of the vector on the right side. Note that in the matrix equation we don t interchange and. 3 1 1 2 1 1 2 0 = 4 Next step: Divide the first equation by the coefficient 3 to make the pivot element equal to 1. + /3 + /3 = 2 + = + 2 + 0 = 2 1 1 2 0 = Next step: Multiply the first equation by 2 and subtract it from the second equation, putting the result in the second equation. This eliminates the coefficient of in the second equation. + /3 + /3 = 0 + /3 5 /3 = 1/3 + 2 + 0 = Next step: Eliminate the coefficient of in the third equation by subtracting the first equation from the third, putting the result into the third equation. + /3 + /3 = 0 + /3 5 /3 = 1/3 0 + 5 /3 /3 = 7/3 0 1/3 5/3 1 2 0 0 1/3 5/3 0 5/3 /3 = = 1/3 1/3 7/3 2
Next step: Now work on the second column (coefficients ). We want the largest coefficient in the second equation (diagonal element in the matrix.) So swap the second and third equation. + /3 + /3 = 0 + 5 /3 /3 = 7/3 0 + /3 5 /3 = 1/3 Now divide by 5/3, the pivot element in the second column. + /3 + /3 = 0 + /5 = 7/5 0 + /3 5 /3 = 1/3 Now eliminate the coefficient of in the third equation by multiplying the second equation by 1/3, subtracting the result from the third equation, and putting the result in the third equation. + /3 + /3 = 0 + /5 = 7/5 0 + 0 24 /15 = 48/15 To complete the triangulation step we divide the third equation by the coefficient of, namely 24/15. + /3 + /3 = 0 + /5 = 7/5 0 + 0 + = 2 0 5/3 /3 0 1/3 5/3 0 1 /5 0 1/3 5/3 0 1 /5 0 0 24/15 = = = 7/3 1/3 7/5 1/3 Notice that the matrix is now in upper triangular form all elements below the diagonal are zero. 0 1 /5 0 0 1 = 7/5 48/15 7/5 2 1.2 Back substitution 3
Next, we do back substitution. We start by noticing that the last equation gives us the solution for. Then we work our way up the third column, eliminating the coefficients of this is just Gaussian elimination backwards! But it amounts to the same thing as plugging in the solution for into the other two equations and moving the resulting constant to the rhs of the equation. So multiply the third equation by 3/5 and subtract from equation two, leaving the result in equation two. Notice that the matrix is now in upper triangular form all elements below the diagonal are zero. 0 1 0 0 0 1 = 2 + /3 + /3 = 0 + + 0 = 0 + 0 + = 2 Continuing on the third column, eliminate the coefficient of in equation 1. + 2 /3 + 0 = 2/3 0 + + 0 = 0 + 0 + = 2 1 2/3 0 0 1 0 0 0 1 = 2/3 2 Next, work on the second column. We have only the coefficient of in the first equation to eliminate. We then get the answer. + 0 + 0 = 1 0 + + 0 = 0 + 0 + = 2 The last step, of course, is to check the solution by plugging it in to the orginal system of equations. 1 + 2 () = 2 1 + () 2 = 3 1 + () + 2 = 4 Notice that we now have a unit matrix, so the solution can be read off. 1 0 0 1 0 1 0 = 0 0 1 2 1 2 0 2 1 3 1 1 1 2 = 4 4
2 Determinants Many properties of a matrix are based on its determinant. calculated, let s start with a simple 3 3 matrix a 11 a 12 a 13 A = a 21 a 22 a 23 a 31 a 32 a 33 To review how they are We learn in high school that the rule for calculating its determinant is to start by multiplying along the main diagonal: a 11 a 22 a 33, then along the parallel super diagonal (wrapping around): a 12 a 23 a 31, then along the parallel sub diagonal (again wrapping around): a 21 a 32 a 13. We add these three terms. Then we switch to the (let s call it) antidiagonal : a 13 a 22 a 31 and its parallel super and sub antidiagonals: a 12 a 21 a 33 and a 11 a 23 a 32. These last three products are subtracted from the sum of the first three. So the full result is det A = a 11 a 22 a 33 + a 12 a 23 a 31 + a 13 a 21 a 32 a 13 a 22 a 31 a 12 a 21 a 33 a 11 a 23 a 32. This method works only for 3 3 matrices. But we can generalize it by recognizing that it has the compact form det A = ( ) P a 1,P 1 a 2,P 2 a 3,P 3 P where the sum is over all permutations P of the columns 123. We note that there are six such permutations: 123, 231, 312, 321, 213, 132, which matches the number of terms in our standard form. We use the shorthand notation P 1, P 2, P 3 to specify one of these six permutations. Any permutation can be achieved by swapping enough pairs of members. The first three permutations in this list are called even, because they are produced by an even number (including 0) of pairwise exchanges, and the last three are called odd, because they require an odd number. The shorthand notation ( ) P means plus for an even and minus for an odd permutation. Expressing the determinant in terms of permutations allows us to generalize to any size matrix: det A = ( ) P a 1,P 1 a 2,P 2 a 3,P 3... a n,p n P The number of terms in the sum is n!. There are other ways you may have learned to calculate the determinant of a matrix. We won t go into details here. One way is to use the cofactor method: Pick a row of the matrix. Let s do it with the first row a 1,k. Work you way across the row, visiting each column once. At each step, construct an (n 1) (n 1) matrix by eliminating the row you are working with and the column that you are visiting at that step. Calculate the determinant of the smaller matrix. This is called the cofactor. Call it A 1,k. Proceed to the end of the row. At that point you have a cofactor for each of the elements in the row. Then the determinant is given by the rule det A = k ( ) k a 1,k A 1,k 5
You can use any row you like. If you use row j, then you get det A = k ( ) k+j a j,k A j,k You can also do it by columns. The cofactor method is, in fact, algebraically the same as the method of summing over permutations. It is just another way of rearranging the terms in the sum. We list some important properties of determinants without proof The determinant of a product of matrices is the product of the determinants. det(ab) = det(a) det(b) If the inverse of a matrix exists, its determinant is the inverse of the determinant of the original matrix. det(a ) = 1/ det(a) Taking the transpose does not change the determinant. det(ã) = det(a) The determinant of a triangular matrix is the product of its diagonal elements. The rule is the same for upper and lower triangular matrices. det(u) = u 11 u 22... u nn This one is easy to show using the permutation sum. If a permutation puts a zero matrix element in the product of terms, then that permutation doesn t contribute anything to the determinant. The only permutation P 1, P 2,..., P n that doesn t involve at least one zero matrix element is the identity permutation 12... n. And that identity permutation gives you the product of the diagonal elements. What is an efficient way to calculate a determinant? Efficient means calculating it with the least number of floating point operations, since that is what usually costs computing effort. The sum over permutations method (or the equivalent cofactor method) is terribly inefficient, because it requires computing the product of n factors in each of n! terms. Including the summation, the number of floating point operations is n n!, which grows extremely rapidly with increasing n. A more efficient way to calculate the determinant is to factor the matrix into a product of a lower triangular and upper triangular matris. det A = det L det U which is just the product of the diagonal elements of each factor. The cost of doing this using the Crout reduction method is the same as the cost of doing Gaussian elimination. It grows as n 3 as the matrix size grows. This is still costly, but for large n it is vastly cheaper than the sum over permutations. 6
3 Eigenvalues and eigenvectors A great many matrices (more generally linear operators) are characterized by their eigenvalues and eigenvectors. They play a crucial role in all branches of science and engineering. Most of the time, finding them requires resorting to numerical methods. So we discuss some simpler methods. 3.1 Characteristic Polynomial Generally speaking, eigenvalues of a square matrix A are roots of the so-called characteristic polynomial: det A λi = P (λ) = 0 That is, start with the matrix A and modify it by subtracting the same variable λ from each diagonal element. Then calculate the determinant of the resulting matrix and you get a polynomial. Here is how it works using a 3 3 matrix: det A λi = A = 1/2 λ 3/2 0 3/2 1/2 λ 0 0 0 1 λ 1/2 3/2 0 3/2 1/2 0 0 0 1 = 2 + λ + 2λ2 λ 3 = ( λ)(1 λ)(2 λ) The three zeros of this cubic polynomial are (, 1, 2), so this matrix has three distinct eigenvalues. For an n n matrix we get a polynomial of degree n. Why? It is easy to see if we remember from the previous section that the determinant is a sum over products of matrix elements. One of those products runs down the diagonal. Since each diagonal element has a λ in it, the diagonals alone give you a polynomial of degree n. The other products have fewer diagonal elements, so they can t increase the degree of the polynomial. beyond n. A polynomial of degree n has n roots, although some of them may appear more than once. Call the zeros of the characteristic polynomial, i.e., the eigenvalues, λ i. If we factor the polynomial in terms of its roots, we get P (λ) = (λ 1 λ)(λ 2 λ)... (λ n λ). Notice that the determinant of the matrix itself is the value of the characteristic polynomial at λ = 0. Plugging in λ = 0 into the factored expression above leads to the result that the determinant of the matrix is the product of its eigenvalues. det A = P (0) = λ 1 λ 2... λ n 7
3.2 Eigenvalue equation When a matrix has a zero determinant, we can find a nontrivial (column vector) solution v to the equation (A λi)v = 0 or Av = λv This is the standard equation for eigenvalue λ and eigenvector v. There can be as many as n linearly independent solutions to this equation as follows Av i = λ i v i. Notice that the eigenvector is not unique. We can multiply both sides of the equation by a constant c to see that if v i is a solution for eigenvalue λ i, so is cv i. Often we deal with real symmetric matrices (the transpose of the matrix is equal to the itself). In that case the eigenvectors form a complete set of orthogonal vectors. They can be used to define the directions of coordinate axes, so we can write any n dimensional vector x as a linear combination x = α 1 v 1 + α 2 v 2 +... + α n v n where the coefficient α i is the component of the vector in the direction v i. More generally, if there are n linearly independent eigenvectors v i, this is also possible. Then we have a simple method for finding the eigenvalue with the largest magnitude, namely the power method. 3.3 Power method The power method originates from the general statement that we can use the eigenvectors of a matrix to represent any vector x: We multiply by A and get x = α 1 v 1 + α 2 v 2 +... + α n v n Ax = α 1 Av 1 + α 2 Av 2 +... + α n Av n = α 1 λ 1 v 1 + α 2 λ 2 v 2 +... + α n λ n v n So we get a new vector whose coefficients are each multiplied by the corresponding eigenvalue: α i λ i. The coefficients with the larger eigenvalues get bigger compared with the coefficients with smaller eigenvalues. So let s say we have sorted the eigenvalues so the one with smallest magnitude is λ 1, and the one with the largest magnitude is λ n. If we multiply by A m times, the coefficients become α i λ m i. If we keep going, the nth term (corresponding to the largest eigenvalue) will eventually swamp all the others and we get (for very large m) A m x λ m n α n v n = y 8
So we get an eigenvector corresponding to the largest eigenvalue. Another way of saying this is that when we hit the vector x with the matrix A we get a new vector that tends to point more in the direction of the leading eigenvector v n. The more factors of A we pile on, the more precisely it points in that direction. So how do we get the eigenvalue if we have an eigenvector y pointing in the direction of v n? If it points in that direction, we must have y = cv n. Then remember that eigenvectors satisfy Ay = Acv n = λ n cv n = λ n y. That is, the vector Ay has every component of y multiplied by the eigenvalue λ n. We can use any of the components to read off the answer. To turn this process into a practical algorithm, we normalize the vector after each multiplication by A. Normalization simply means multiplying by a constant to put the vector in our favorite standard form. A common normalization is to divide by the Cartesian norm so we get a vector of unit length. The normalization we will use here is dividing the whole vector by the component with the largest magnitude. If we take the absolute values of the components, the largest one is called the infinity norm. So we divide by the infinity norm if that component is positive and by minus the infinity norm if it is negative. Here is an example. Suppose we have y = (3, 2, ). The infinity norm is 3. We divide by 3 to normalize to get x: x = (1, 2/3, /3). Let s call the component with the largest magnitude the leading component. In the example, the leading component in y is the first one and it is positive. If it was 3, instead, we d divide by 3. The goal is to get a vector proportioanl to the original vector, but with one component equal to +1 and with the rest of the components no larger in magnitude than 1. The reason we pick this way of normalizing the vector is that we can then easily read off the eigenvalue by looking at what happens to the leading component when we multiply by A. Also, the infinity norm is cheaper to compute, since it doesn t require any arithmetic just comparisons. Suppose, after normalizing y to get x, we multiply by A one more time and get Ax = (2,, 2/3). We can read off the eigenvalue from the leading component. It is 2. Of course, we could check every component to see that each one got multiplied by 2. So here is the power algorithm Start with any arbitrary vector y. Repeat steps 1-3 until convergence. Step 1: Normalize y to get x. The leading component is now 1. Step 2: Compute y = Ax. Step 3: The approximate eigenvalue is the new leading component. 9
Convergence means that when you normalize y, the new x is close enough to the old x to declare victory. You choose what is close enough to suit the requirements of your problem. Notice that with the power algorithm, you can start with any arbitrary vector and the method will converge to the same result. Well, that is almost any starting vector. You might be unlucky and pick a starting vector that has a zero component α n along the leading eigenvector v n. But numerical calculations are usually subject to roundoff error, so even if you unwittingly started witn α n = 0, chances are very good that after a few hits with the matrix A, you develop a tiny nonzero α n, and then it is just a matter of time before its coefficient grows to dominate the iteration. 3.4 Inverse power method The inverse power method works with the inverse A, assuming it exists. It is easy to check that the eigenvectors of the matrix A are also eigenvectors of its inverse, but the eigenvalues are the algebraic inverses: A v i = µ i v i where µ i = 1/λ i. So now the eigenvalue µ i with the largest magnitude corresponds to the eigenvalue λ i with the smallest magnitude. So we can get the largest and smallest eigenvalues. How do we get the ones between? For a matrix whose eigenvalues are all real, we can do this by generalizing the inverse power method. We take the inverse of the shifted matrix (A qi), where q is any number we like. (We intend to vary q.) The eigenvectors of this matrix are still the same as the eigenvectors of A: (A qi) v i = µ i v i where, now, µ i = 1/(λ i q). Which is the largest µ i? It depends on q. If q is close to one of the λ i s, then µ i is maximum for that i. So if we hold that q fixed and run the power method, we eventually get the eigenvector v i. Then we change q and rerun the power method. It s like tuning a radio dial. As q gets close to a new eigenvalue, we get the next broadcast station, i.e. the next eigenvector. If we keep going, eventually, we get them all. Clearly, if any of the eigenvalues are complex, we would have a lot of searching to do, because we d need to search the entire complex plane, and not just the real line interval between λ 1 and λ n. There are better methods. 3.5 Other methods: QR algorithm The power and inverse power method are simple and very easy to implement, but if you want all the eigenvalues, those methods are very inefficient. There are other, much more sophisticated and efficient methods, though. Here we describe in broad terms the Householder/QR algorithm for real symmetric matrices. For details, please see standard texts in numerical methods. A real symmetric matrix A can be put into diagonal form by a real orthogonal similarity transform. In other words, there exists a real orthogonal matrix Q such that the product 10
(similarity transform) Λ = QAQ (1) is a diagonal matrix Λ. (An orthogonal matrix Q is one whose transpose Q is its inverse: Q Q = QQ = 1.) This solves the problem, because the eigenvalues of the matrix A are the diagonal values in Λ, and the eigenvectors are the column vectors of Q. We say that the transform Q diagonalizes the matrix. Of course, finding the transform Q is a challenge. With the Householder/QR algorithm it is done through an iterative process that eventually converges to the answer. The first step in the process, the Householder step, is to find an orthogonal similarity transform that puts the matrix in tridiagonal form. This can be done exactly in a finite number of steps. The Householder method finds a matrix P that is not only orthogonal, it is symmetric (P = P ):  = P AP. (2) The matrix  is tridiagonal (and real and symmetric). In the next phase, the QR phase, we apply a succession of orthogonal similarity transforms Q (i) on the tridiagonal matrix that make the off-diagonal values smaller. Eventually they become small enough that we can say it is diagonal for all intents and purposes. The first similarity transform is applied to the tridiagonal matrix Â:  (1) = Q (1) ÂQ (1). (3) The transform is constructed so the resulting matrix Â(1) is still tridiagonal, but the offdiagonal elements are smaller. Then we apply the second similarity transform to the result above:  (2) = Q (2)  (1) Q (2). (4) We keep going until eventually Â(n), for large n, is close enough to a diagonal matrix that we can call it our Λ. Putting all the transforms together, we get Λ = lim Q (n) Q(n)... Q (2) Q(1) P AP Q (1) Q (2)... Q (n) Q (n) (5) n It is easy to show that the product of real orthogonal matrices is also real orthogonal. So the product Q = P Q (1) Q (2)... Q (n) Q (n) (6) is the orthogonal matrix that diagonalizes A. That is what we wanted. So where does the R in the QR algorithm come in? At each step the tridiagonal matrix  (i) is factored into a product of an orthogonal matrix Q (i) and an upper triangular matrix R (i)  (i) = Q (i) R (i). (7) Hence the name QR. This factorization can be done exactly with a finite number of steps. Then one can show that the combination  (i+1) = Q (i)  (i) Q (i). (8) 11
is tridiagonal. That gives us the next matrix in the sequence. The same factorization is done on it, and the process continues. As we can see, finding the eigenvalues this way takes some work. It turns out that the rate of convergence depends on the spacing of the eigenvalues. If two eigenvalues are very close to each other (compared with their average spacing), more iterations are needed to get convergence. If they are well separated, fewer iterations are required. The QR algorithm also works with (complex) Hermitian matrices (A = A). This covers a vast number of cases in science and engineering. The eigenvalues are still real, but the similarity transform is unitary (Q Q = QQ = 1). Here the dagger means the complex conjugate transpose. 12