Bare-bones outline of eigenvalue theory and the Jordan canonical form

Bare-bones outline of eigenvalue theory and the Jordan canonical form April 3, 2007 N.B.: You should also consult the text/class notes for worked examples. Let F be a field, let V be a finite-dimensional vector space over F, and let T : V V be a linear operator. 1. Definition of eigenvalues and eigenvectors. An element λ F is called an eigenvalue of T if there exists a nonzero vector x V such that T x = λx; such a vector is called an eigenvector (or λ-eigenvector) of T. Clearly, we have the equivalences λ is an eigenvalue of T ker(t λi) 0 T λi is not injective det(t λi) = 0, because we know that a linear transformation has nonzero kernel if and only if it is not invertible if and only if it has zero determinant. In particular, 0 is an eigenvalue of T if and only if T is not invertible. The λ-eigenspace of T is defined to be the set of λ-eigenvectors of T : it is a subspace of V. 2. Eigenvalues exist when F is algebraically closed. First, recall that F is algebraically closed if every polynomial p(x) = a 0 + a 1 X + + a n X n, with a i F, has a zero in F, i.e., there exists c F such that p(c) = 0. If the ground field F is algebraically closed, then we can prove the existence of an eigenvalue of T, as follows. First proof. Since F is algebraically closed, the polynomial χ(x) = det(t XI) has a root, which is necessarily an eigenvalue of T (see 1). Second proof (longer, but more elementary). Let x V be any nonzero vector. Because dim V <, there is a smallest d 1 such that the vectors x, T x,..., T d x are linearly dependent. Thus, there are scalars a 0,..., a d, not all zero, such that 0 = a 0 x + a 1 T x + + a d T d x = p(t )x, (1) where p(x) is the polynomial p(x) = a 0 +a 1 X + +a d X d. Since F is algebraically closed, we can factor p(x) into linear factors, say p(x) = a d (X c d ) (X c 1 ) with c i F. The minimality property defining d implies that a d 0. So from Eq. (1) we get 0 = (T c d )(T c d 1 ) (T c 1 )x. From this equality it follows that one of the c i must be an eigenvalue of T : If (T c 1 )x = 0 then c 1 is an eigenvalue of T, otherwise there must be some i > 1 for which v = (T c i 1 ) (T c 1 )x 0, but (T c i )v = 0. Page 1 of 9

Thus, in principle, the problem of finding eigenvalues is straightforward: write down the polynomial det(t XI) and find its roots. In practice, however, the best one can usually do is determine the roots approximately. There is also the matter of computing the polynomial det(t XI), which is a computationally expensive task. 3. Dimension bounds the number of eigenvalues. If dim V = n, then there are at most n distinct eigenvalues of T. More precisely: If λ 1,..., λ k are distinct eigenvalues of T, then any corresponding set of eigenvectors x 1,..., x k is linearly independent. Proof. We, of course, start by writing a 1 x 1 + + a k x k = 0 for scalars a i F, and try to show that all the a i s are necessarily zero. We will show that a 1 = 0, by finding an operator S such that Sx 1 0 but Sx i = 0 for i > 1. Applying S to the zero-sum a 1 x 1 + + a k x k will then give 0 = a 1 Sx 1, thereby forcing a 1 = 0. The vanishing of the other coefficients follows from the same techinque (for different choices of the operator S). Note this simple technique carefully, for we will use it several times in these notes, and it is fairly standard in linear algebra. Let S be the linear operator (T λ 2 ) (T λ n ). The factors of S commute so the equality (T λ 2 )x 2 = 0 implies that Sx 2 = 0, and the same line of reasoning shows that Sx i = 0 for all i 1. On the other hand, upon applying S to x 1, we get Sx 1 = i 1 (λ 1 λ i )x 1, which is not zero because the eigenvalues λ i are pairwise unequal. We deduce that a 1 = 0, as explained in the previous paragraph. To complete the proof, it remains to apply the same technique to x 2, x 3, etc., in the obvious way. 4. Counterexamples. Eigenvalues might not exist if F fails to be algebraically closed. For example, the field F = R is not algebraically closed, and any non-trivial rotation about the origin of R 2 is a real-linear transformation with no (real) eigenvalues, as no nozero vector of R 2 is scaled under rotation. The definition of an eigenvalue makes sense when V is infinite-dimensional, but then their existence is not guaranteed, even if F is algebraically closed. For example, if V were the infinite-dimensional vector space C[X] of complex polynomials, then the operator T = d/dx: V V has no eigenvector: the exponential function is an infinite power series, and is therefore not an element of V. 5. Diagonalizability. The operator T is said to be diagonalizable if there is a basis V consisting of eigenvectors of T ; with respect to such a basis, the matrix of T is a diagonal matrix whose diagonal elements are the eigenvalues of T. Not every operator (or, equivalently, matrix) is diagonalizable. For example, the rational matrix T = ( ) 1 1 0 1 is not diagonalizable: the only eigenvalue of T is 1, and clearly the 1-eigenvectors of T cannot span Q 2 (as T I). The main question that we will be concerned with in these notes is: When is T diagonalizable, and how can one tell? If T is diagonalizable, then certainly T possesses sufficiently many eigenvalues. In Section 2 we saw that the existence of eigenvalues is guaranteed when F is algebraically closed (and don t forget our standing assumption that dim V < ), so we shall often assume this. Under this assumption, our question will be resolved by the existence of the so-called Jordan canonical form of T ( 20). Page 2 of 9

6. Generalized eigenvectors, multiplicity of eigenvalues. Let λ F. A vector x V is called a generalized (λ-)eigenvector if (T λ) k x = 0 for some k 1. The set of generalized λ-eigenvectors form a vector subspace of V. We write V T (λ) = {The space of generalized λ-eigenvectors of T }. Clearly, this space is nonzero if and only if λ is an eigenvalue of T. The multiplicity of λ is defined to be the dimension of V T (λ). [Note: In the literature, generalized eigenvectors are sometimes called root vectors. ] The impossibility of diagonalizing a general operator T is an expression of the fact that V is not, in general, spanned by the eigenspaces of T (e.g., there is only one eigenspace of T = ( ) 1 1 0 1 as an operator of Q 2, and it is 1-dimensional). This deficiency of eigenvectors is corrected by the notion of a generalized eigenvector. 7. Main theorem 1: Generalized eigenspace decomposition. If F is algebraically closed, then V = λ F V T (λ). More specifically, if λ 1,..., λ k is the set of (distinct) eigenvalues of T, then V = V T (λ 1 ) V T (λ k ). (GED) (as V T (λ) = 0 whenever λ is not an eigenvalue of T ). In other words, every v V can be expressed uniquely as a sum v = v 1 + + v k with v i V T (λ i ). Moreover, there is a basis of V with respect to which the matrix of T is block-diagonal of the form B 1 λ i..., where B i = (zeros below the diagonal) B k... λi and the size of the block B i is the eigenvalue multiplicity dim V T (λ i ). This theorem, which is also known as the primary decomposition theorem (in the case when F is algebraically closed), is basic to all that follows. The proof will come after we establish a few basic facts about generalized eigenvectors. 8. Basic facts about generalized eigenvectors. (a) Generalized eigenvectors v 1,..., v k corresponding, respectively, to distinct eigenvalues λ 1,..., λ k, are linearly independent. Proof. The identical technique in 3 applies here, too, but for slightly more complicated operators S (in fact, the complication is essentially notational). Suppose a 1 v 1 + a k v k = 0 with a i F. Let N be a positive integer that is so large that (T λ i ) N v i = 0 for i = 1,..., k, and let m be the smallest non-negative integer such that (T λ 1 ) m+1 x 1 = 0, i.e., (T λ 1 ) m x 1 0 is a λ 1 -eigenvector. To show that a 1 = 0, we let S = (T λ 1 ) m (T λ 2 ) N (T λ k ) N. We have (as in 3), Sx 2 =... = Sx k = 0. To compute Sx 1, we should express S in a form that distinguishes the nilpotency ( 15) of (T λ 1 ) on x 1. Thus, we write S = (T λ 1 ) m i 1( (T λ1 ) + (λ 1 λ i ) ) N. Page 3 of 9

In this form, we can see that Sx 1 = i 1 (λ 1 λ i ) N x 1 0, as follows. Expanding each factor ( (T λ 1 ) + (λ 1 λ i ) ) N according to the binomial theorem, we see that the whole product S acting on x 1 evalutes to a sum of the form Sx 1 = i 1 (λ 1 λ i ) N x 1 + terms of the form (scalar) (T λ 1 ) l x 1, with l > m, and each term of the second form is zero, because already (T λ 1 ) m+1 x 1 = 0. Now, applying S to the zero-sum a 1 v 1 + +a k v k gives 0 = a 1 Sx 1 0, which forces a 1 to be zero. Cleary, the same technique can be used to show that the remaining coeffcients are also zero. (b) V T (λ) = ker{(t λ) n }, where n = dim V. Proof. We assume that λ is an eigenvalue of T, for otherwise there s nothing to prove (both sides are zero). Since V is finite-dimensional, there is certainly a positive integer k such that V T (λ) = ker{(t λ) k }; let k be the smallest such integer. Clearly, it will suffice to show that k n. [To see that such a k exists, observe that we have an increasing chain of subspaces of V T (λ), ker(t λ) ker(t λ) 2 ker(t λ) 3, which must eventually stabilize, since dim V T (λ) <.] To this end, we pick a nonzero x V T (λ) with the property (T λ) k 1 x 0. We claim that the k vectors x, (T λ)x,..., (T λ) k 1 x are linearly independent, which would imply that k n. The strategy is of course! the one of part (a): Thus, we suppose a 0 x + a 1 (T λ)x + + a k 1 (T λ) k 1 = 0 and apply the operator S = (T λ) k 1 to the zero-sum, and observe that all the terms are immediately annihilated, leaving us with a 0 (T λ) k 1 x = 0. Since (T λ) k 1 x 0, we get a 0 = 0. Again, the same technique will give a 1 = 0, a 2 = 0, etc. (c) For each λ F, the generalized eigenspace V T (λ) is T -invariant, i.e., if x V T (λ), then T x V T (λ). Moreover, if λ is an eigenvalue of T, then λ is the only eigenvalue of the restriction of T to V T (λ). Proof. T -invariance is clear, because T commutes with any power of T λ. To demonstrate the other statement, let λ be an eigenvalue of the restriction of T to V T (λ), and let x be a corresponding eigenvector. From part (b), we have 0 = (T λ) n x = (λ λ) n x. Since x 0, we must therefore have λ = λ. 9. Proof of the generalized eigenspace decomposition ( 7). Let λ be an eigenvalue of T, and let n = dim V. We claim that 7(GED) will follow once we show that V = V T (λ) Im(T λ) n. (2) Indeed, suppose this direct-sum decomposition holds. Then the image W = Im(T λ) n is T -invariant, and therefore we can reason by induction on the dimension of the space, that W has a decomposition according to (GED) (observing, additionally, that (GED) is trivially true in the 1-dimensional case). Consequently, (GED) for the lower dimensional subspace W will yield (GED) for V. First notice that, by the dimension formula, V T (λ) = ker(t λ) n and Im(T λ) n will span V so long as V T (λ) Im(T λ) n = 0. So suppose x V T (λ) Im(T λ) n. Then Page 4 of 9

x = (T λ) n y for some y V. By 8(b), we have 0 = (T λ) n x = (T λ) 2n y, which implies that y V T (λ) = ker(t λ) n, and hence x = (T λ) n y = 0. This proves Eq. (2), and the decomposition 7(GED) now follows by applying the (appropriate) inductive hypothesis to W. [Exercise: Formulate this hypothesis explicitly!] Now we show the existence of a basis with the asserted properties. Since T maps each generalized eigenspace to itself ( 8(c)), and V is the direct sum of its generalized eigenspaces, any basis of V made up of generalized eigenvectors will express T as a block-diagonal matrix. So, we only need to show that each generalized eigenspace V T (λ i ) has a basis for which the matrix of T (restricted to V T (λ i )) is upper-triangular with λ i s along the diagonal. To this end, we examine the operator N = T λ i regarded as an operator on V T (λ i ). From 8(b) we know that N is nilpotent ( 15). Hence one of the subspaces in the the chain of subspaces ker N ker N 2 ker N 3 must eventually equal V T (λ i ). Thus, if we start by picking a basis of ker N, and extend that to a basis of ker N 2, and extend that to a basis of ker N 3, etc., then by this eventually terminating process, we will get a basis of V T (λ i ). With respect to this basis, the matrix of N is clearly strictly upper-triangular, and so the matrix of T = λ i I + N is upper-triangular with λ i s along the diagonal. 10. The minimal polynomial of an operator. Since the F -vector space End(V ) of linear operators is finite-dimensional, there is a smallest non-negative integer d such that the operators I, T, T 2,..., T d (I = identity operator) are linearly dependent (as elements of End(V )). Hence there are elements a i F such that The polynomial 0 = a 0 + a 1 T + + a d 1 T d 1 + T d m T (X) = a 0 + a 1 X + + a d 1 X d 1 + X d F [X] is called the minimal polynomial of T. It has the property that it is the monic polynomial p(x) (i.e., polynomial with highest-power coefficient 1) of smallest degree such that p(t ) = 0. 11. Factors of the minimal polynomial. Suppose F is algebraically closed. Let λ 1,..., λ k be the eigenvalues of T, and let ν i be the smallest positive integer such that V T (λ i ) = ker(t λ i ) νi. Then m T (X) = (X λ 1 ) ν1 (X λ k ) ν k. (3) Consequently, by 7(GED) and 8(b), the degree of m T (X) is no more than dim V. Moreover, m T (X) divides any polynomial p(x) F [X] such that p(t ) = 0. Proof. We first show that (X λ i ) νi divides m T (X). Let x 0 be a λ i -eigenvector of T. Then 0 = m T (T )x = m T (λ i )x, which implies that m T (λ i ) = 0. Hence there is a largest integer µ i such that (X λ i ) µi divides m T (X). To show that µ i ν i, we assume the contrary and derive a contradiction. If µ i < ν i, then there exists some x V T (λ i ) such that (T λ i ) µi x 0 (this follows from the very characterization of ν i ). Let q(x) be the monic polynomial m T (X)/(X λ i ) µi. Because F is algebraically closed, we can factor q(x), say q(x) = (X λ 1) (X λ l ). Then 0 = m T (T )x = q(t )(T λ i ) µi x. Now we argue as in 2 (second proof): Since y = (T λ i ) µi x 0 and 0 = q(t )y = (T λ 1) (T λ l )y, one of Page 5 of 9

the factors T λ j must annihilate a nonzero vector in V T (λ i ) (a vector in V T (λ i ), because y V T (λ i ), and V T (λ i ) is stable under T ). But the restriction of T to V T (λ i ) has only one eigenvalue, namely λ i ( 8(c)). Hence λ i = λ j. But this is absurd, for λ j is a zero of q(x), while by the very definition of q(x), λ i is not a zero of q(x). We conclude that µ i ν i. Now we eliminate the possibility that µ i > ν i. Indeed, from 7(GED) and the characterizing property of the ν i s, we know that for the polynomial m(x) = (X λ 1 ) ν1 (X λ k ) ν k, we have m(t ) = 0. Since µ i ν i, m(x) divides m T (X). But m T (X), by definition, is the lowest degree polynomial annihilating T. Hence not only must m(x) divide m T (X), it must equal m T (X). We have now established the factorization (3). As for the last assertion of the theorem, notice that in the first paragraph of our proof, we, in fact, showed that (X λ 1 ) ν1 (X λ k ) ν k divides p(t ) whenever p(t ) = 0. (The minimality of m T (X) was only used in the second paragraph.) 12. The characteristic polynomial. The polynomial c T (X) = det(xi T ) is called the characteristic polynomial of T. It is a monic polynomial of degree dim V. As we showed in 1, λ F is an eigenvalue of T if and only if c T (λ) = 0. Thus, if F is algebraically closed, then we have a factorization c T (X) = (X λ 1 ) µ1 (X λ k ) µ k, where the λ i s are the eigenvalues of T. The exponents µ i are, in fact, the eigenvalue multiplicities ( 6), i.e., µ i = dim V T (λ i ). This is an easy consequence of the equality c T (X) = det(xi T ) = det(xi A), where A is the block-diagonal matrix appearing in the statement of the generalized eigenspace decomposition ( 7). The fact that T is also a zero of its characterstic polynomial is the content of the well-known Cayley-Hamilton theorem. 13. The Cayley-Hamilton theorem. Let c T (X) = det(xi T ) be the characteristic polynomial of T. Then c T (T ) = 0. Proof. Since m T (T ) = 0, (recall that m T (X) is the minimal polynomial of T, see 10), it suffices to show that m T (X) divides c T (X). The theorem is valid over an arbitrary field, but we will only prove it under the simplifying assumption that F is algebraically closed. The minimal polynomial of T then equals m T (X) = (X λ 1 ) ν1 (X λ k ) ν k, where ν i is the smallest integer such that V T (λ i ) = ker(t λ i ) νi (see 11). From 8(b) it follows that ν i dim V T (λ i ) = µ i. Hence m T (X) divides c T (X) = (X λ 1 ) µ1 (X λ k ) µ k. 14. The correct interpretation of the Cayley-Hamilton theorem. It is obvious that det(t I T ) = 0: but be careful this triviality is not the content of the Cayley-Hamilton theorem! Rather, one needs first to compute the polynomial det(xi T ), and then substitute T for X. In terms of matrices, the characteristic polynomial c T (X) = det(xi T ) is the determinant of a matrix with entries in the ring F [X], not in the field F. For example, if T = ( ) 1 1 0 2, then the Cayley-Hamilton theorem is the assertion that T 2 3T + 2I = 0. Page 6 of 9

15. Nilpotent operators. A linear operator N: V V is said to be nilpotent if N d = 0 for some d 1. The smallest power d such that N d = 0 is called the order of N. It is easy to check that 0 is the only eigenvalue of N. [Do it!] Therefore, by 11(3), the minimal polynomial of a nilpotent operator of order d is X d. Moreover, there is basis of V with respect to which N is (strictly) upper-triangular [Homework exercise! but see 9]. 16. Cyclic subspaces: Definition. A cyclic subspace of V, or more precisely, a T -cyclic subspace, is a subspace of V of the form F [T ]x := { f(t )x f(t ) is an polynomial in T with coefficients in F }, for some x V. Actually, this only defines F [T ]x as a set, but it s easy to check that we, indeed, get a subspace of V. We shall be particularly interested in the case where T is nilpotent. 17. Main theorem 2: Cyclic decomposition for a nilpotent operator. If N is a nilpotent operator of V, then there are vectors x 1,..., x k of V such that V = F [N]x 1 F [N]x k. Proof. Here s a more verbose account of the sketch I gave in class. Let d be the nilpotence degree of N, i.e., N d = 0, but N d 1 0. Pick a vector x 1 V such that N d 1 x 1 0, and consider the cyclic subspace Z 1 = F [N]x 1. Since Z 1 is N-stable (i.e., N(Z 1 ) Z 1 ), N descends to a nilpotent operator Ñ on the quotient V/Z 1. Precisely, this means that the operator Ñ: V/Z 1 V/Z 1 defined by Ñ(x + Z 1) = N(x) + Z 1 is a well-defined linear operator (i.e., if x x Z 1 then Ñ(x + Z 1) = Ñ(x + Z 1 )). Since dim(v/z 1 ) = dim V dim Z 1 < dim V, we may argue by induction on the dimension to conclude that V/Z 1 has a cyclic decomposition, say V/Z 1 = F [Ñ] x 2 F [Ñ] x m (noting also that the theorem is true, trivially, in the 1-dimensional case). The only tricky part of the proof (the main part, really) is to lift this decomposition to V, and show that it complements Z 1. To this end, it suffices to establish the following claim: If d i is the period of the Ñ-cycle { x i, Ñ x i, Ñ 2 x i,... } (which is to say, d i is the smallest integer such that Ñ di x i = 0), then x i has a representative x i (i.e., an element x i V such that x i = x i + Z 1 ) whose N-cycle also has period d i. and Assuming this, the proof now basically wraps itself up, for we have: V = Z 1 + F [N]x 2 + + F [N]x k dim V = dim Z 1 + dim(v/z 1 ) = dim Z 1 + dim(f [N]x 2 ) + + dim(f [N]x k ) (since dim(f [N]x 2 ) = dim(f [Ñ] x 2), as the N-period of x i equals the Ñ-period of x i cf. Proof of 8(b)) which implies that V = Z 1 F [N]x 2 F [N]x k. Page 7 of 9

To establish the claim for x 2, say, start by picking any representative x of x 2. Since N d2 x Z 1, we have N d2 x = p(n)x 1 for some polynomial p(x). Thus 0 = N d x = N d d2 p(n)x 1. Since the N-cycle of x 1 has length d, X d2 must divide p(x), say p(x) = X d2 q(x). Let x 2 = x q(n)x 1. Then x 2 represents x 2 and N d2 x 2 = 0, and the claim follows (obviously, N s x 2 0 whenever s < d 2 ). Clearly, the same argument applies to establish the analogous claim for the remaining cyclic generators x 3,..., x k. The theorem is thereby proved. 18. Jordan matrices: Terminology. It is important to be clear here about what we mean by Jordan matrix, because there are slight variations of usage in the literature, and by having names for various kinds of Jordan matrices we can also avoid having to write big matrices everywhere. Let λ F. A simple Jordan (or λ-jordan) matrix of size k is the k k matrix J k (λ) = λ 1 λ 1...... λ 1 λ (blanks are zero). Observe that a simple Jordan matrix J k (λ) can be expressed as a sum of a scalar matrix (namely, λi) plus a nilpotent matrix of order k (namely, J k (λ) λi, which is the matrix with 1 s just above the main diagonal and zeros elsewhere). Observe also that λ is the unique eigenvalue of J k (λ), and that space of its eigenvectors is 1-dimensional. A Jordan (or λ-jordan) matrix is any block-diagonal matrix of the form J l (λ) J m (λ).... Jn(λ) 19. Jordan canonical form of a nilpotent operator. If N is a nilpotent operator of V, then there is a basis of V with respect to which the matrix of N is a 0-Jordan matrix. Proof. By the cyclic decomposition for a nilpotent operator ( 17), there are vectors x 1,..., x k of V such that V = F [N]x 1 F [N]x k. For each x i, let α i be the smallest positive integer such that N αi x i = 0. From the proof of 8(b), it follows that the vectors N α1 1 x 1,..., Nx 1, x 1,..., N α k 1 x k,..., Nx k, x k, form an ordered basis for V. With respect to this basis, the matrix of N is a 0-Jordan matrix with k blocks of sizes α 1,..., α k. 20. Jordan canonical form, in general. Suppose T : V V is a linear transformation all of whose (distinct) eigenvalues, say λ 1,..., λ m, lie in F (i.e., the characteristic/minimal polynomial of T is a product of linear factors with coefficients in F ). Then there is a basis of V with respect to which the matrix of T takes the block-diagonal form J = J λ1 J λ2... Jλm Page 8 of 9

where each J λi is an λ i -Jordan matrix. Moreover, for each eigenvalue λ of T, if β k (λ) denotes the number of elementary λ-jordan matrices of size k (within J λ ), then β k (λ) = rank{(t λ) k 1 } 2 rank{(t λ) k } + rank{(t λ) k+1 }. (4) In particular, the matrix J is uniquely determined by T up to permutation of its Jordan blocks, because formula (4) only depends on T. You should check that formula (4) gives zero when λ is not an eigenvalue of T (as it should). We call J the Jordan canonical form of T. 21. Computing the Jordan canonical form. Because we have an efficient means of computing the rank of a transformation (method of row reduction), the rank formula 20(4) gives an efficient way to compute the Jordan canonical form (when the eigenvalues of T are known). You will get homework exercise working out the Jordan canonical form concretely. Additionally, consult Hoffman-Kunze (or any other linear algebra textbook) for other worked examples. 22. Proof of the existence of the Jordan canonical form. Since T maps each generalized eigenspace to itself ( 8(c)), the generalized eigenspace decomposition of V ( 7) shows that it suffices to prove that each generalized eigenspace V T (λ) (for λ an eigenvalue of T ) has a basis with respect to which the matrix of T (or rather, of the restriction of T to V T (λ)) is a λ-jordan matrix. By 8(b), the operator (T λ) is nilpotent on V T (λ). By 19, we know that V T (λ) then has a basis with respect to which the matrix of (T λ) is a 0-Jordan matrix; equivalently, the matrix of T = (T λ) + λ in this basis is a λ-jordan matrix. Verification of the rank formula (4) is left as a homework exercise (with copious hints)! Page 9 of 9