Diagonalization MATH 1502 Calculus II Notes November 4, 2008 We want to understand all linear transformations L : R n R m. The basic case is the one in which n = m. That is to say, the case in which the domain and the target are the same vector space. We understand a few such transformations: rotations and dilations of R 2 as well as diagonal transformations of R 2 which dilate by different factors in the two standard coordinate directions. In this set of notes, we generalize this last case to greatly expand our understanding of linear transformations. Let us begin with a linear transformation L : R 2 R 2 and work out the details of the diagonalization algorithm. We will then present the theory behind what we have done including mini-lectures on determinants, the relation between column vectors and rank, inverses of matrices, and change of basis. The particular transformation we will start with is defined by L(x, y) = ( x + y, x y). If necessary, the first thing to do is find the matrix associated with L. In this case, that is the matrix ( ) 1 1 A =. 1 1 From there, one computes the determinant of the matrix A λi and sets the result equal to zero to get an equation for the eigenvalues λ: 1 λ 1 1 1 λ = ( 1 λ)( 1 λ) 1 = λ2 + 2λ = 0. 1
In this case, we get a quadratic equation λ(λ + 2) = 0 with two roots 0 and 2. These are the eigenvalues. Next, we return to the eigenvalue equation L(v) = λv, using each of the possible eigenvalues, in order to find the eigenvectors. It may also be preferable to use the eigenvalue equation in the form (A λi)v = 0. In any case, we find for λ = 2 that v 1 + v 2 = 2v 1 or v 2 = v 1. (You can easily check that the other equation is the same as this one.) This tells us that the eigenvectors corresponding the the eigenvalue λ = 2 are the nonzero vectors of the form (v 1, v 1 ) = v 1 (1, 1). It is enough to pick any one of them, so we ll just take (1, 1). Turning to the eigenvalue λ = 0, we find v 1 + v 2 = 0 or v 2 = v 1. In this case, we can take (1, 1) as a representative eigenvector. (You can and should check at this point that each of the eigenvectors we have found is indeed an eigenvector corresponding to the specified eigenvalue.) Let us denote the eigenvectors as follows: v = (1, 1); w = (1, 1). We will now change basis in order to visualize the linear transformation L. This is a somewhat tricky procedure, just because you need to keep track of where things are going and how the matrix multiplication works; there s also the computation of an inverse transformation thrown in there as well. But if you don t see all the details, that is OK; we ll get to those. The basic idea is to find a second (diagonal) linear transformation L which is easy to understand, and then realize that L does essentially the same thing that L does. Thus, we will understand L because we understand L. Enough talk, here is the algorithm. Form the inverse of the change of basis matrix by putting the eigenvectors in its columns: M 1 = ( 1 1 1 1 ). At the risk of introducing some confusion, let me mention what we are doing here. Notice that the vectors v and w can be used to form a basis for R 2. This is where the terminology change of basis comes from. We want to exchange the standard basis {e 1,e 2 } for the new basis is {v,w} as a means of understanding L. Put another way, we want to represent the original transformation L with respect to this basis, rather than the standard basis in which the matrix A is expressed. (Here comes the confusing part.) This means that, in effect, we want to think about L by considering the first eigenvector v as the first 2
standard basis vector and the second eigenvector w as the second standard basis vector, in a new and different copy of R 2. Accordingly, we introduce an auxiliary bookkeeping transformation M : R 2 R 2 (from the original copy of R 2 to the new copy of R 2 ) which assigns v to e 1 and w to e 2. As usual, we want to specify M by specifying its matrix M, but then we need to know Me 1 and Me 2 ; remember that those are the columns. Of course, its still possible to find M directly using the fact that Mv = e 1 and Mw = e 2 ; it turns out that it is easier to write down the matrix for the inverse transformation M 1 ; we already know its columns are the eigenvectors. Then there is an algorithm, which we will learn, to find the inverse of a matrix, and we can apply it to find the inverse of M 1 which, of course, is M. The actual change of basis matrix is the matrix M so that MM 1 = I. This matrix turns out to be M = 1 ( ) 1 1. 2 1 1 You can check it yourself. Exercise 1. Check that MM 1 = I = M 1 M. The matrix of the diagonalized transformation is ( 2 0 Ã = MAM 1 = 0 0 Notice that this is a diagonal matrix. Exercise ). 2. Check that MAM 1 is the diagonal matrix given above. 3. Make a drawing which illustrates the transformation L. 4. Find the matrix M directly using the specification that Mv = e 1 and Mw = e 2. 5. Find the eigenvalues and eigenvectors associated with the following matrices: ( ) ( ) 4 2 3 2. 1 1 2 5 3
6. Find the Change of basis matrices for each of the matrices in the last exercise. 7. Diagonalize these matrices Finding Eigenvalues We also need to understand what is behind these computations. This discussion applies to the general case of a linear transformation L : R n R n with corresponding matrix A. We first wish to consider the eigenvalue/eigenvector problem for the matrix L. You will recall that this means we are looking for nonzero vectors v and scalars λ such that L(v) = λv. The existence of such an eigenvector/eigenvalue pair leads us to look at the matrix A λi. Exercises 8. What is the eigenvector/eigenvalue equation associated with L? 9. Why do we look at the matrix A λi? The matrix A λi corresponds to a linear transformation Λ : R n R n. Since we know v is in the kernel of Λ, the following general fact becomes important: A linear transformation has nontrivial kernel if and only if the columns of the associated matrix are linearly dependent. To see why this is true, you will need to know/review the definition of linear dependence. Let b 1,...,b n be the column vectors of the matrix B associated with the linear transformation in question. Given a vector x in the domain of the transformation R n, a fundamental relation between the product Bx and the columns is the following: Bx = n x j b j (1) j=1 4
where x 1,...,x n are the entries in x. From this, we see that if x is a nonzero vector in the kernel so that Bx = 0, then there is a linear combination of the columns xj b j = 0 with not all coefficients x j vanishing. This is one definition of linear dependence. Conversely, if the columns b 1,...,b n form a linearly dependent set, then there are coefficients x 1,..., x n which are not all zero and, hence, can be arranged into a nonzero vector x, such that Bx = 0. Thus, x is a nonzero vector in the kernel. Exercises 10. Verify the identity (1). 11. What vector is contained in the kernel of every linear transformation? 12. Show that if a subset Σ of a vector space is closed under addition of vectors and for each vector v Σ the additive inverse v of v is also in Σ, then Σ contains the zero vector. Is such a set Σ necessarily a subspace? 13. Show that any subset of a vector space that is closed under scalar multiplication must contain the zero vector. Is such a subset always a subspace? 14. Show that any subset of a vector space that is closed under addition of vectors and scalar multiplication contains the zero vector. Determinants For a square matrix (or equivalently a linear transformation from R n to R n ) we can go one step further and use another fundamental fact: A square matrix has linearly dependent columns if and only if it has zero determinant. This one is a little more tricky to see in general, but it is not too hard to see completely why it is true up to dimension 2. Let us begin with what I hope is a review of how to compute the determinant of a matrix. 5
If the matrix is a 1 1 matrix, then it looks like (a) where a is a scalar. If the matrix is a 2 2 matrix, then ( ) a11 a det 12 = a a 21 a 11 a 22 a 12 a 21. 22 A very fundamental fact, that you can check for 2 2 matrices but which is also true in general is the product formula for determinants which says det(ab) = det(a)det(b) where A and B are square matrices of the same size. Exercise 15. Check the product formula for determinants when A and B are 2 2 matrices. In the case of 2 2 matrices, the absolute value of the determinant is the area of the parallelogram determined by the column vectors, i.e., {sa 1 + ta 2 : 0 s, t 1}, where a 1 and a 2 are the columns of the matrix. In order to see this, first check it for a rotation matrix, and then rotate so that the image of the first column vector lies along the positive x-axis. Then you can use the product formula for determinants as follows: det(a) = det(r)det(a) = det(ra), where R is the rotation matrix with Ra 1 = a 1 e 1. Notice that the matrix RA has a zero entry in the bottom right corner. Therefore, its determinant is simply the product of the upper left (1, 1) entry with the bottom right (2, 2) entry. The first factor is the length of the base of the parallelogram determined by the columns of RA and the second factor is (plus or minus) the height of the same parallelogram. That is to say, it is clear that det(ra) is the area of the parallelogram determined by the columns of RA. Since this parallelogram is just a rotation of the parallelogram determined by the columns of A, it has the same area, and we have verified our assertion. 6
16. Explain in your own words why the area of the parallelogram determined by the columns of a 2 2 matrix is the absolute value of its determinant. 17. Explain how the area relation described above can be used to show the assertion that the columns of a 2 2 matrix are linearly dependent if and only if the determinant of the matrix is zero. For higher dimensional determinants we want to point out two things. First, from the point of view of computation, we should know how to compute the determinant as summarized in the following formulas for row and column expansion. The expansion of det(a) along the i-th row is given by n det(a) = ( 1) i+j a ij det(cof(i, j)). j=1 In this formula, cof(i, j) is the (n 1) (n 1) matrix obtained by deleting the i-th row and the j-th column from the matrix A. Similarly, the expansion of det(a) along the j-th column is given by n det(a) = ( 1) i+j a ij det(cof(i, j)). i=1 18. Compute the determiant of the matrix a 11 a 12 a 13 a 21 a 22 a 23. a 31 a 32 a 33 19. Compute the determinants of the following matrices 1 2 2 0 2 3 4 1 1 2 0 2 0 2 5 3 20. Calculate the determinant of the matrix 1 λ 1 1 1 1 λ 1. 1 1 1 λ 2 1 0 0 1 2 1 0 0 1 2 1 0 0 1 2. 7
21. Can you give a geometric interpretation for the determinant of a 3 3 matrix? 22. What is the determinant of the n n identity matrix? Second, from the point of view of knowing what to expect from determinants it is exceedingly useful to think of them as functions on the column vectors (or the row vectors) of a matrix. In particular, if we write the columns of a matrix A as a 1,...,a n so that then we find det(a) = det(a 1,...,a n ), If all but one of the columns are held fixed, then det is linear in the remaining column. If two column vectors are the same, then det(a) = 0. Subtracting any multiple of one column from another column produces a matrix whose detrminant is the same as the determiant of A. If A has a column of all zeros, then det(a) = 0. 23. Use the column expansion formula to verify the first property above. 24. Use the column expansion formula to verify the second property of determinants. 25. Use the first and second properties to verify the third property. 26. Use the properties of determinants to verify that the determinant of a matrix is zero if and only if its columns are linearly dependent. 27. Formulate four properties of determinants analogous to those above using rows instead of columns. In fact, the four properties using rows are also true. Given a matrix A, the transpose of A, which is denoted by A T is the matrix whose k-th column is the k-th row of A. For a square matrix, this means that A T is obtained from A by flipping about the diagonal. It turns out that det(a T ) = det(a), and this explains why the row properties hold as well as the column properties. 8
Inverse Matrices Finally, let us state a formula for the inverse of a matrix. We denote by cof(a) the matrix obtained from the square matrix A by replacing each entry a ij in A by ( 1) i+j det(cof(i, j)). That is, if cof(a) = C = (c ij ), then We can then verify that c ij = ( 1) i+j det(cof(i, j)). A 1 = 1 det(a) (cof(a))t. In words, the inverse of A is one over the determinant of A times the matrix A first cofactored and then transposed. 28. Not every square matrix has an inverse. Which ones don t? 29. How does the ivertibility of a square matrix relate to the kernel of the associated linear transformation? 30. Find the inverses of the following matrices. 2 1 0 1 2 1 0 1 2 1 1 1 1 2 2 1 2 3 31. A matrix is called symmetric if A T = A. Make up five symmetric matrices, and diagonalize all of them. 32. Make up five 2 2 matrices and (try to) diagonalize them. What interesting things do you find? How do your findings relate to our endeavor to understand linear transformations? 33. What happens if you try to diagonalize a 2 2 rotation matrix?. 9