CANONICAL FORMS FOR LINEAR TRANSFORMATIONS AND MATRICES D. Katz The purpose of this note is to present the rational canonical form and Jordan canonical form theorems for my M790 class. Throughout, we fix an n-dimensional vector space V over the field F. Write L(V, V ) for the n 2 -dimensional vector space of linear operators from V to itself and recall that L(V, V ) is isomorphic as a vector space (and as a ring) to M n (F ), the set of n n matrices with entries in F. The impetus for our work is provided by the following question. Given T L, can we represent T by a matrix having a canonical (i.e., natural ) form? It turns out that the form of the matrix we obtain depends almost entirely on the factorization of the minimal polynomial of T over F. For example, writing q(x) for the minimal polynomial of T, if q(x) has all of its roots in F, it turns out that T is triangularizable. In other words, there exists a basis V such that the matrix for T with respect to V is lower triangular. In fact, we will be able to do much better, in that we will be able to put T into its so-called Jordan canonical form whenever q(x) splits over F. The Jordan form will still be lower triangular, but most of its off diagonal entries will be zero. When q(x) does not necessarily have its roots in F, the situation is more complicated. Nevertheless, the so-called rational canonical form will still be a desirable form in that, while not upper triangular, it will be zero-heavy. Before embarking on our journey, we recall a few definitions from class. Let T be a linear operator on V. A subspace W V is T -invariant if T (w) W for all w W. T -invariant subspaces play an important role in what follows. They provide the proper vehicle for proofs by induction and are an integral part of the decomposition of V leading to the canonical forms. Here s an easy way to get a T -invariant subspace. Fix v V and let W denote the subspace spanned by the vectors {v, T (v), T 2 (v),... }. Then W is T -invariant, since T takes a typical vector α 0 v + + α r T r (v) to α 0 T (v) + + α r T r+1 (v), which is again in 1
2 D. KATZ W. Note that W = {f(t )(v) f(x) F [X]}. We call W the cyclic subspace generated by v. Note that if the minimal polynomial q(x) of T has degree r, then W is spanned by the vectors {v, T (v),..., T r 1 (v)}. Indeed, for any j r, write X j = f(x)q(x) + s(x), with the degree of s(x) less than r. Substituting T gives T j = s(t ), so T j (v) = s(t )(v) W. Moreover, given f(x) F [X], if f(t )(v) = 0, then f(t )(w) = 0 for all w W. We will rely heavily upon these observations. Now suppose that W is an arbitrary T -invariant subspace. Then T induces a linear transformation T W : W W. The basic strategy for decomposing T is roughly the following. Proceed by induction on the dimension of V. Find T -invariant subspaces W, U V satisfying V = W U. By induction there is a basis for W bringing T W to the desired form and a basis for U bringing T U to the desired form. Putting these bases together brings T to the desired form. We begin by recalling a definition. Definition A. Let W 1,..., W k be subspaces of V. For each 1 i k, let {w i1,..., w 1ni } be a basis for W i. Assume n 1 + + n k = n. If V := {w 11,..., w 1n1,..., w k1,..., w knk } is a basis for V, then V is said to be the (internal) direct sum of W 1,..., W k. Note that this condition is independent of the bases chosen from the W i. When V is the direct sum of W 1,..., W k, we denote this by writing V = W 1 W k. For example, if V = R 3, W 1 is any line through the origin and W 2 is any plane through the origin not containing W 1, then V = W 1 W 2. The following proposition elucidates nature of direct sums. We ll simply sketch its proof. Proposition B. The following are equivalent for subspaces W 1,..., W k V : (i) V = W 1 W k. (ii) V = W 1 + + W k and for all 1 i k, W i (W 1 + W i 1 + W i+1 + W k ) = 0. (iii) Given v V, there exist unique w i W i with v = w 1 + + w k. Sketch of Proof. Keep the notation established in Definition A. If (i) holds, then expressing any v V in terms of the basis V shows that W = W 1 + + W k. Now suppose that w i = w 1 + w i 1 + w i+1 + + w k (w j W j, 1 j k) represents an element in W i (W 1 + W i 1 + W i+1 + + W k ). Bringing w i to the right hand side yields a sum of elements from W 1,..., W k being zero. Expressing each term in the sum in terms of the given
CANONICAL FORMS FOR LINEAR TRANSFORMATIONS AND MATRICES 3 basis for the corresponding W i shows that the sum must be zero (by linear independence of the collection V). Thus, W i (W 1 + W i 1 + W i+1 + + W k ) = 0, so (i) implies (ii). For (ii) implies (iii), use the second condition in (ii) to get the uniqueness of the required sum in (iii). (i) follows from (iii) in similar fashion. The next proposition shows that if V is the direct sum of T -invariant subspaces, then there exists a basis for V for which the corresponding matrix of T takes a block diagonal form. This is the first significant step along our lengthy path. Proposition C. Let W 1,..., W k be T -invariant subspaces such that V = W 1 W k. For each 1 i k, let W i := {w i1,..., w ini } be a basis for W i and set V := W 1 W k. Let A denote the matrix of T with respect to V and A i denote the matrix of T Wi with respect to W i. Furthermore, write q(x) for the minimal polynomial of T. Finally, for 1 i k, let q i (X) be the minimal polynomial of T Wi. Then : (i) The matrix for T with respect to V has the (block diagonal) form A 1 0 0 0 A A = 2 0....... (ii) q(x) = LCM(q 1 (X),..., q k (X)). 0 0 A k Proof. Part (i) is basically clear. When we apply T to the basis vectors in W i, the resulting vectors belong to W i and are expressible in terms of the vectors in W i. It follows that the column in A corresponding to T (w ij ) is zero in all entries save possibly the entries belonging to rows n 1 + + n i 1 + 1 through n 1 + + n i (for all 1 j n i ). Thus A takes the required from. For part (ii), set h(x) := LCM(q 1 (X),..., q k (X)). Let 1 i k, take w i W i and write h(x) = a i (X)q i (X). Then h(t )(w i ) = a i (T )(q i (T )(w i )) = 0. Since V = W 1 + + W k, it follows that h(t )(v) = 0, for all v V. Thus, h(t ) = 0, so q(x) divides h(x). On the other hand, q(t ) = 0, so q(t Wi ) = 0, for each i. Thus, q i (X) divides q(x) for all i, so h(x) divides q(x). Since h(x) and q(x) are monic polynomials, h(x) = q(x).
4 D. KATZ Now that we know that a matrix decomposition follows from a direct sum decomposition, we need to know when we can achieve such a decomposition. The next proposition shows that we can decompose V according to how we factor q(x), the minimal polynomial of T. Proposition D. Let q(x) denote the minimal polynomial of T and suppose we factor q(x) = q 1 (X) e1 q k (X) e k, with each q i (X) irreducible over F. Then there exist subspaces W 1,..., W k V satisfying the following conditions. (i) W 1,..., W k are T -invariant. (ii) V = W 1 W k. (iii) q i (X) ei is the minimal polynomial of T Wi, for 1 i k. Proof. For each 1 i k, set W i := ker(q i (T ) ei ). Then each W i is T -invariant. Indeed, w i W i implies q i (T ) ei (w i ) = 0, so 0 = T (q i (T ) ei (w i )) = q i (T ) ei (T (w i )), so T (w i ) W i. This gives (i). For part (ii), set f i (X) := j i q j(x) ej. Then f 1 (X),..., f k (X) have no common divisor, so there exist a 1 (X),..., a k (X) F [X] such that a 1 (X)f 1 (X) + + a k (X)f k (X) = 1. Thus, for all v V, v = a 1 (T )f 1 (T )(v) + + a k (T )f k (T )(v). However, for each 1 i k, q i (T ) ei (a i (T )f i (T )(v)) = a i (T )(q(t )(v)) = 0. Thus, each a i (T )f i (T )(v) W i and V = W 1 + + W k. To show that the sum is direct, fix an i between 1 and k and take c i (X), d i (X) F [X] satisfying c i (X)q i (X) ei + d i (X)f i (X) = 1. Suppose that w i = w 1 + w i 1 + w i+1 + + w k, with w j W j, for all j. Then w i = (1 c i (T )q i (T ) ei )(w i ) = d i (T )f i (T )(w 1 ) + + d i (T )f i (T )(w i 1 ) + d i (T )f i (T )(w i+1 ) + + d i (T )f i (T )(w k ) = 0, since f i (T )(w j ) = 0, for all j i. Thus, W i (W 1 + W i 1 + W i+1 + + W k ) = 0, and the sum is direct. This gives part (ii). To prove part (iii), let g i (X) denote the minimal polynomial of T Wi, for 1 i k. Then g i (X) divides q i (X) ei, for all i, since q i (T Wi ) ei = 0, by definition of W i. Thus,
CANONICAL FORMS FOR LINEAR TRANSFORMATIONS AND MATRICES 5 g i (X) = q i (X) e i, e i e i, since q i (X) is irreducible. On the other hand, by Proposition C, q 1 (X) e1 q k (X) e k = q(x) = LCM(g 1 (X),..., g k (X)) = q 1 (X) e 1 qk (X) e k. It follows at once that g i (X) = q i (X) ei, for all i, which is what we want. Now, let s summarize what we ve done by putting together Propositions C and D. Write q(x) for the minimal polynomial of T and factor q(x) = q 1 (X) e1 q k (X) e k, with each q i (X) irreducible over F. By Proposition D, we can can decompose V as a direct sum of the T -invariant subspaces W i := ker(q i (T ) ei ). We will refer to the W i as the primary components of V. By Proposition C, if we take a basis V for V comprised of bases from W 1,..., W k, the matrix of T with respect to V is block diagonal, where the ith block diagonal entry is the matrix of T Wi with respect to the selected basis from W i. We will refer to these blocks as primary blocks associated to T. As things now stand, the primary blocks have no particular form. The next and most difficult phase of our journey requires an analysis at the primary block level. In other words, we assume that q(x) is a power of an irreducible polynomial and push on from there. Before doing so, we ll look at a special case. Remark E. Let W be a vector space and T a linear operator on W. Suppose there is an element w W such that W := {w, T (w),..., T r 1 (w)} is a basis for W. Then we can write T r (w) = λ 0 w λ 1 T (w) λ r 1 T r 1 (w), for some λ j F. A straightforward calculation now shows that the matrix of T with respect to W is 0 0 0 λ 0 1 0 0 λ 1 0 1 0 λ 2........ 0 0 0 λ n 1 Writing f(x) := X r + λ n 1 X r 1 + + λ 0 for the corresponding polynomial, we say that the matrix above is the companion matrix for f(x), and denote it C(f(X)). In the best of worlds, each primary component W i in our general discussion would admit a basis of the type above so that the matrix of T Wi with respect to that basis was a companion matrix.
6 D. KATZ Alas this is not so; but we will see that each primary component can be decomposed into a direct sum of such spaces, so that the matrix of T Wi will be block diagonal, with companion matrices as blocks. The following crucial proposition is our next step towards this end. Proposition F. Let W be a finite dimensional vector space and T a linear operator on W. Suppose that the minimal polynomial of T is q(x) e, with q(x) an irreducible polynomial. Let r denote the degree of q(x) e. Then there exists w W such that w, T (w),..., T r 1 (w) are linearly independent. Proof. Before starting the proof, we note that for any u W, the set {u, T (u),..., T r (u)} is linearly dependent, since q(x) e has degree r. Thus the w we seek gives rise to the largest possible linearly independent set of the form {w, T (w),... }. Now, suppose the proposition is false. Take any w W and let W be the cyclic subspace generated by w. Let f(x) denote the minimal polynomial of T W. Since, by assumption, w, T (w),..., T r 1 (w) are linearly dependent, any dependence relation gives rise to a nonzero polynomial h(x) of degree r 1 such that h(t )(w) = 0. As mentioned above, this means h(t W ) = 0, so f(x) divides h(x). In particular, f(x) has degree less than or equal to r 1. On the other hand, since q(t ) e = 0, q(t W ) e = 0, so f(x) divides q(x) e. Since q(x) is irreducible, this forces f(x) to be a power of q(x), and since f(x) has degree less than r, f(x) = q(x) f, for f e 1. Thus, f(x) divides q(x) e 1. Writing q(x) e 1 = p(x)f(x), it follows that q(t ) e 1 (w) = p(t )f(t )(w) = 0. This now holds for all w W, so q(t ) e 1 = 0. But this contradicts the fact that q(x) e is the minimal polynomial of T. Thus, there must be a w W such that w, T (w),..., T r 1 (w) are linearly independent. Remark G. Let w W satisfy the conclusion of the Proposition F and write W 1 for the cyclic subspace generated by w. Then : (i) {w, T (w),..., T r 1 (w)} is a basis for W 1. Indeed, this set of vectors is linearly independent by design and by our comments before Definition A, they must also span W 1. (ii) q(x) e is the minimal polynomial for T W1. After all, w, T (w),..., T r 1 (w) are independent, so the minimal polynomial for T W1 must have degree at least r, and since q(t W1 ) e = 0, q(x) e must be the minimal polynomial of T W1. (iii) The matrix of T W1 with respect to the basis {w, T (w),..., T r 1 (w)} is C(q(X) e ).
CANONICAL FORMS FOR LINEAR TRANSFORMATIONS AND MATRICES 7 Now if we think of this W as one of the primary components of V, then we have the start of the decomposition alluded to in Remark E. Suppose we could find a T -invariant subspace U such that W = W 1 U. Then we could either repeat the construction of Proposition F on U, or simply apply induction to write U as a direct sum of T -invariant cyclic subspaces. Thus, W is then decomposed into a sum of spaces that admit bases so that the matrix of the restriction of T with respect to any of these bases is a companion matrix. Thus T restricted to W can then be represented as a block diagonal matrix with companion matrices for blocks. Therefore, we must find U. This is the most difficult part of the proof of our canonical form theorems. The conductor. Let B be finite dimensional vector space, T a linear operator on B and A B a T -invariant subspace. Fix a vector v B\A. Then there is a unique monic polynomial g(x) F [X], called the conductor of v into A, such that : (i) g(t )(v) A and (ii) whenever f(x) is a polynomial with f(t )(v) A, then g(x) divides f(x). The existence of g(x) should be standard by now. Take g(x) to be a monic polynomial of least degree such that g(t )(v) A. If f(x) is a polynomial such that f(t )(v) A, then write f(x) = h(x)g(x) + r(x) with the degree of r(x) less than the degree of g(x). Then r(t )(v) = f(t )(v) h(t )(q(t )(v) A, so by definition of g(x), r(x) = 0, i.e., g(x) divides f(x). Uniqueness of g(x) follows along similar lines. An easy comment : If the minimal polynomial of T is q(x) e, with q(x) irreducible, then the conductor of v into A is q(x) c for some c e, and in fact, c is the least positive integer such that q(t ) c (v) A. We will need the following general fact, which is interesting in its own right. General Fact. Let T be a linear operator on the finite dimensional vector space L and write f(x) for the minimal polynomial of T. If h(x) is any polynomial relatively prime to f(x), then f(t ) : L L is an isomorphism. To see this, take a(x), b(x) F [X] such that a(x)h(x) + b(x)f(x) = 1. Substituting T gives a(t )h(t ) = I L = h(t )a(t ), so h(t ) is an isomorphism. Lemma H. Let W be a finite dimensional vector space and T a linear operator on W. Suppose that the minimal polynomial of T is q(x) e, with q(x) an irreducible polynomial. For w 1 W, let W 1 be the cyclic subspace generated by w 1 and suppose that q(x) e is also the minimal polynomial of T W1. Let U V be a T -invariant subspace satisfying W 1 U = 0
8 D. KATZ and W + U V. Then there exists v V \(W 1 + U ) such that the conductor of v into U equals the conductor of v into W 1 + U Proof. Let v V be any vector not in W 1 +U. Let t(x) be the conductor of v into W 1 +U and g(x) be the conductor of v into U. Then by our comments above, t(x) = q(x) d, g(x) = q(x) c and d c e. We would like to have d = c. This may not hold for the v we ve chosen, but we can adjust v so that it does hold. To elaborate, write q(t ) d (v) = u+f(t )(w 1 ), with u U and f(t )(w 1 ) W 1, for some f(x) in F [X]. Write f(x) = h(x)q(x) s, with h(x) not divisible by q(x). We want to see s d. Now, q(t ) d (v) = u + h(t )q(t ) s (w 1 ). Applying q(t ) c d to both sides of this equation, we obtain q(t ) c (v) = q(t ) c d (u) + h(t )q(t ) c d+s (w 1 ). Subtracting, we obtain q(t ) c (v) q(t ) c d (u) = h(t )q(t ) c d+s (w 1 ) = 0, since the left hand side belongs to U and the middle term belongs to W 1. By the general fact above, h(t ) is an isomorphism. Then q(t ) c d+s (w 1 ) = 0 and hence q(t W1 ) c d+s = 0. Therefore, c d + s e, so s d e c 0. Now set ṽ := v h(t )q(t ) s d (w 1 ). First, note that ṽ W 1 + U. Second, note that for any i 1, q(t ) i (h(t )(q(t ) s d (w)) W 1, from which it follows that q(t ) i (ṽ) W 1 + U if and only if q(t ) i (v) W 1 + U. Thus, q(x) d is the conductor of ṽ into W 1 + U (since this is the conductor of v into W 1 + U ). But now, q(t ) d (ṽ) = q(t ) d (v) h(t )q(t ) s (w 1 ) = u, so q(t ) d (ṽ) U. Thus q(x) d is divisible by the conductor of ṽ into U. But this latter conductor is always divisible by the conductor of ṽ into W 1 + U since any expression h(t )(ṽ) landing in U automatically lands in W 1 + U. Thus, q(x) d must also be the conductor of ṽ into U, which gives what we want. The previous Lemma is technically the most difficult part of the proof. However, it provides the crucial tool we need in finding the complementary T -invariant subspace U referred to in Remark G. Proposition I. Let T be a linear operator on a finite dimensional vector space W. Assume the minimal polynomial for T is q(x) e, with q(x) irreducible. Let W 1 W be the cyclic subspace generated by w W and assume that the minimal polynomial for T W1 is also q(x) e. Then there exists a T -invariant subspace U W such that W = W 1 U. Proof. If W = W 1, there is nothing to prove. Assume W 1 W. Suppose we could prove
CANONICAL FORMS FOR LINEAR TRANSFORMATIONS AND MATRICES 9 the following statement : If U V is a T -invariant subspace satisfying W 1 U = 0 and W 1 + U V, then there exists a T -invariant subspace U U satisfying W 1 U = 0 and W 1 + U W 1 + U. Then, starting with U = 0, we may apply the process building up U until U satisfies V = W 1 + U. By design, W 1 U = 0, so V = W 1 U, as desired. To finish the proof, suppose that U is T -invariant, W 1 U = 0 and W 1 + U V. By Lemma H, there exists v W 1 + U such that the conductor of v into U equals the conductor of v into W 1 + U. Let q(x) c denote this conductor and suppose a equals the degree of q(x) c. Now set U := U + v, T (v),..., T a 1 (v). Of course, W 1 + U W 1 + U. We need to see that U is T -invariant and W 1 U = 0. To see that U is T -invariant, it suffices to show T (T a 1 (v)) = T a (v) U. Write X a = 1 q(x) c + r(x), where r(x) has degree less than a. Then T a (v) = q(t ) c (v) + r(t )(a). But q(t ) c (v) U and r(t )(v) v, T (v),..., T a 1 (v), so T a (v) U, as required. To see W 1 U = 0, suppose there is u U such that u := u + β 0 v + β a 1 T a 1 (v) W 1. Set g(x) = β a 1 X a 1 + + β 0. Then g(t )(v) = u u U + W 1. Thus, g(x) is divisible by the conductor of v into U + W 1. By choice of v, this means g(x) is divisible by the conductor of v into U. Thus, g(t )(v) U. Therefore, u = u + g(t )(v) U W 1 = 0, so W 1 U = 0. This completes the proof. Our first Theorem provides the crucial step in the canonical form theorems that follow. Theorem J. Let W be a finite dimensional vector space and T a linear operator on W. Suppose that the minimal polynomial of T is q(x) e, with q(x) an irreducible polynomial. Let r denote the degree of q(x) e. Then there exist T -invariant subspaces W 1,..., W t of V satisfying the following conditions. (i) V = W 1 W t. (ii) For each 1 i t, W i has a basis of the form {w i, T (w i ),..., T ri 1 (w i )}. (iii) For each 1 i t, the minimal polynomial of T Wi is q(x) ei, for some e i e, and deg(q(x) ei ) = r i. Moreover, we can arrange the ordering so that e = e 1 e 2 e t. Remark K. Retain the notation of Theorem J. It follows from conditions (ii) and (iii) that each W i is the T -invariant subspace generated by w i and that the dimension of W i equals
10 D. KATZ the degree of the minimal polynomial of T Wi. Thus, just as in Remark E, the matrix of T Wi with respect to the given basis is the companion matrix C(q(X) ei ). It therefore follows that our primary blocks can be decomposed into block diagonal matrices with companion matrices down the diagonal. This is the rational canonical form. The polynomials q(x) ei in Theorem J are called the elementary divisors of T and are uniquely determined by conditions (i) - (iii). The definition of elementary divisor extends to arbitrary T by applying Theorem J to its primary blocks. We do this below. Proof of Theorem J. Proceed by induction on the dimension s of W. If s = 1, there is nothing to prove. Suppose that s > 1. By Proposition F, there exists w W such that w, T (w),..., T r 1 (w) are linearly independent. By Remark G, these vectors form a basis for the cyclic subspace W 1 generated by w and the minimal polynomial for T W1 is q(x) e. Now we set e 1 := e. By Proposition I, there exists a T -invariant subspace U W such that W = W 1 U. Then, q(t U ) e = 0, so the minimal polynomial of T U has the form q(x) e2, with e 2 e 1 = e. By induction on s, there exist T -invariant subspaces W 2,..., W t U satisfying the conclusions of the theorem. Clearly then, W 1,..., W t (and the minimal polynomials of T W1,..., T Wt ) satisfy the conclusions of the theorem. We are now ready for the main result of these notes. Theorem L (Rational Canonical Form). Let V be a finite dimensional vector space and T a linear operator on V. Write q(x) for the minimal polynomial of T and factor q(x) = q 1 (X) e1 q k (X) e k, with each q i (X) irreducible over F. Then there exists a basis V for V such that A, the matrix of T with repect to V, has the the block diagonal form A 1 0 0 0 A A = 2 0......, 0 0 A k where A 1,..., A k are primary blocks corresponding to q 1 (X) e1,..., q k (X) e k, and for each
CANONICAL FORMS FOR LINEAR TRANSFORMATIONS AND MATRICES 11 1 i k, C(q i (X) ei,1 ) 0 0 0 C(q i (X) ei,2 ) 0 A i =...... 0 0 C(q i (X) ei,t i ) for integers e i = e i,1 e i,2 e i,ti. Proof. By Proposition D, we may decompose V = W 1 W k, where for each 1 i k, W i is T -invariant and the minimal polynomial of T Wi is q i (X) ei. By Theorem J, we may decompose each W i = W i,1 W i,ti so that for each 1 j t i, W i,j has a basis of the form W i,j := {w i,j, T (w i,j ),... T ri,j 1 (w i,j )} and the minimal polynomial of T Wi,j is q i (X) ei,j with e i = e i,1 e i,ti and deg(q i (X) ei,j ) = r i,j. Thus, as in Remark E, the matrix of T Wi,j with respect to the given basis is C(q i (X) ei,j ). Since V = W 1,1 W 1,t1 W k,1 W k,tk, if we take V := W i,1 W 1,t1 W k,1 W k,tk, then the matrix of T with respect to V has the required form (by Proposition C). Remarks M. (i) The process leading to the rational canonical form is algorithmic, assuming we can factor the minimal polynomial of T. In other words, starting with T, we can follow the path just traversed to contruct in finite time the basis V in Theorem L. Of course, the process needn t be the most efficient or even doable in any reasonable amount of time with or without the aid of a computer. To summarize, first find the minimal polynomial q(x). This can be calculated since it divides the characteristic polynomial of T and we can calculate the characteristic polynomial using determinants. Then factor q(x) = q 1 (X) e1 q k (X) e k, with each q i (X) irreducible over F. (In general, this factorization may not be readily computable.) Next calculate ker(q i (T ) ei ) for each 1 i k. This is equivalent to solving a homogenous system of equations, so Gaussian elimination applies. As in the proof of Theorem L, restricting T to each ker(q i (X) ei ) then reduces the problem to the case where the minimal polynomial of T is q(x) e, with q(x) irreducible. Now, as in the proof of Theorem J, set r := deg(q(x) e ) and consider the set {v, T (v),..., T r 1 (v)} as v ranges through the standard basis (say) for V. One of these collections must be linearly independent, a
12 D. KATZ condition that can be checked with Gaussian elimination. Calling the v that works w 1 and using W 1 to denote the corresponding T -invariant subspace generated by w 1, use the proof of Theorem G to construct a T -invariant subspace U satisfying V = W 1 U. (Its important to note that U can be explicity constructed, thanks to Lemma F which is also constructive). Finally repeat the process leading to W 1 and U, this time on T U. This leads to U = W 2 U 2, with W 2 meeting the required conditions. The process will terminate in finitely many steps. (ii) Let the collection of integers {e i,j } be given as in Theorem L. The polynomials {q i (X) ei,j } are called the elementary divisors of T and uniquely determine the rational canonical form of T. We ll see below, that the elementary divisors of T are unique. In other words, up to a possible permutation of primary blocks along the diagonal, the rational canonical form of T is unique. If we just want the rational canonical form of T and not the basis giving it, there is a constructive procedure leading to the calculation of the elementary divisors of T. The procedure is easy to describe, but the proof that the procedure works is a bit beyond the scope of this course. (iii) Of course, the rational canonical form theorem has a form for matrices. Take B in M n (F ). Let T be the linear transformation taking the jth standard basis of V to the jth column of B. If V is the basis for which the matrix of T with respect to V has rational canonical form A, for A as in Theorem L, then P 1 BP = A, for P the change of basis matrix corresponding to V. Likewise, we call A the rational canonical form of B. (iv) While the rational canonical form for a transformation or matrix is unique, different rational canonical forms could have the same minimal polynomial. Thus, the minimal polynomial alone does not determine the rational canonical form. However, for a fixed n and a fixed q(x), there are only finitely many n n matrices in rational canonical form having minimal polynomial q(x). We illustrate this in the Examples below. We now consider the case where our linear transformation T has the property that its minimal polynomial has all of its roots in F. We ll see below that the Jordan canonical form for T is lower triangular. In fact, aside diagonal entries and possibly having 1 s below the main diagonal, the rest of its entries will be zero. The first step towards the Jordan canonical form is a consideration of the case where
CANONICAL FORMS FOR LINEAR TRANSFORMATIONS AND MATRICES 13 all of the roots of q(x) are zero. In this case q(x) = X e, for some e, so T e = 0. Such a transformation is said to be nilpotent. Similarly, a matrix B is said to be nilpotent if B e = 0 for some e. The archetypal nilpotent matrix is a lower (or upper) triangular matrix with zero s down the diagonal. More importantly, for each s 1 let M s denote the following s s matrix Then M s satisfies M s s = 0 and M s 1 s 0 0 0 0 1 0 0 0 M s = 0 1 0 0......... 0 0 1 0 0. In other words, the minimal polynomial for M s is X s. We now show that any nilpotent transformation (or nilpotent matrix) can brought to block diagonal form with matrices M s down the diagonal. Theorem N (Nilpotent Form). Assume that T is a nilpotent linear transformation with minimal polynomial X e. Then there exists a basis V for V and integers e = e 1... e t such that A, the matrix of T with respect to V, has the block diagonal form M e1 0 0 0 M e2 0 A =...... 0 0 M et. Proof. We just use Theorem L to find a basis V such that A, the matrix of T with respect to V, is in rational canonical form. Since the minimal polynomial of T is X e, A has just one primary block, A itself. The primary block in turn has the block diagonal form C(X e1 ) 0 0 0 C(X e2 ) 0 A =......, 0 0 C(X et ) with e = e 1 e t. But a quick check reveals that C(X ei ) = M ei, for all i, which gives what we want.
14 D. KATZ We now use the nilpotent form theorem to give us the Jordan canonical form theorem. The proof is based upon the following observation. If the minimal polynomial of T has the form (X λ) e, then the transformation S := T λ is nilpotent and can therefore be brought to nilpotent form. Using the same basis for T yields a matrix obtained from the one for S by placing λ s above the 1 s appearing in the nilpotent form. In fact, for s 1 and λ F, we define the s s Jordan block associated to λ to be the s s matrix λ 0 0 0 1 λ 0 0 J(s, λ) := 0 1 0 0......... 0 0 1 λ Theorem O (Jordan Canonical Form). Let q(x) be the minimal polynomial of T and suppose that q(x) has all of its roots in F. Write q(x) = (X λ 1 ) e1 (X λ k ) e k. Then there exists a basis V for V such that A, the matrix of T with repect to V, has the block diagonal form A 1 0 0 0 A A = 2 0......, 0 0 A k where A 1,..., A k are primary blocks corresponding to (X λ 1 ) e1,..., (X λ k ) e k, and for each 1 i k, J(e i,1, λ i ) 0 0 0 J(e i,2, λ i ) 0 A i =...... 0 0 J(e i,ti, λ i ) for integers e i = e i,1 e i,2 e i,ti. We call {(X λ i ) ei,j }, the Jordan elementary divisors of T. Proof. By Proposition D, we may find T -invariant subspaces W 1,..., W k V such that for each 1 i k, the minimal polynomial of T Wi is (X λ i ) ei. Set S i := (T λ i ) Wi. S i
CANONICAL FORMS FOR LINEAR TRANSFORMATIONS AND MATRICES 15 is nilpotent, so there exists a basis W i of W i such that the matrix of S i with respect to W i equals M ei,1 0 0 0 M ei,2 0......, 0 0 M ei,ti For e i = e i,1 e i,ti. Therefore, the matrix A i of T Wi with respect to W i satisfies J(e i,1, λ i ) 0 0 0 J(e i,2, λ i ) 0 A i =......, 0 0 J(e i,ti, λ i ) as required. Putting together the bases W i yields the required basis V. Examples P. (i) Let F = Q and set q(x) = (X 2) 3 (X 2 + 1) 2. We want to find rational canonical forms for all 11 11 matrices in M 11 (Q) having minimal polynomial q(x). Since q(x) has two irreducible factors, the matrices we seek have two primary blocks, each of which are to be decomposed into blocks of companion matrices associated either to a power of X 2 less than or equal to 3 or a power of X 2 + 1 less than or equal to 2. Each primary block is strictly determined by its elementary divisors, so in fact it suffices to list all possible sets of elementary divisors for 11 11 matrices having minimal polynomial q(x). It turns out that there are eight possible rational canonical forms. Call these B 1,..., B 8. We assign each of these the following elementary divisors : (a) B 1 : (X 2) 3, (X 2) 3, (X 2), (X 2 + 1) 2. (b) B 2 : (X 2) 3, (X 2) 2, (X 2) 2, (X 2 + 1) 2. (c) B 3 : (X 2) 3, (X 2) 2, (X 2), (X 2), (X 2 + 1) 2. (d) B 4 : (X 2) 3, (X 2), (X 2), (X 2), (X 2), (X 2 + 1) 2. (e) B 5 : (X 2) 3, (X 2) 2, (X 2 + 1) 2, (X 2 + 1). (f) B 6 : (X 2) 3, (X 2), (X 2), (X 2 + 1) 2, (X 2 + 1). (g) B 7 : (X 2) 3, (X 2 + 1) 2, (X 2 + 1) 2. (h) B 8 : (X 2) 3, (X 2 + 1) 2, (X 2 + 1), (X 2 + 1).
16 D. KATZ Each B i therefore is block diagonal, with companion matrices corresponding to the listed elementary divisors down the diagonal. For example 0 0 8 0 0 0 0 0 0 0 0 1 0 12 0 0 0 0 0 0 0 0 0 1 6 0 0 0 0 0 0 0 0 0 0 0 0 0 8 0 0 0 0 0 0 0 0 1 0 12 0 0 0 0 0 B 1 = 0 0 0 0 1 6 0 0 0 0 0, 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 2 0 0 0 0 0 0 0 0 0 1 0 where the block diagonal entries are C((X 2) 3 ), C((X 2) 3 ), C(X 2) and C((X 2 + 1) 2 ). (ii) Set F = Q(i) and q(x) = (X 2) 3 (X 2 + 1) 2 = (X 2) 3 (X + i) 2 (X i) 2. We seek Jordan canonical forms for all 9 9 matrices having minimal polynomial q(x). As before, the primary blocks are determined by their elementary divisors, though the Jordan block associated to a Jordan elementary divisor is not the companion matrix of the elementary divisor. For example, if the elementary divisors are (X 2) 3, (X 2) 2, (X +i) 2, (X i) 2, the corresponding Jordan form has block diagonal entries J(3, 2), J(2, 2), J(2, i), J(2, i). Here, the first two Jordan blocks make up the primary block corresponding to (X 2) 3 and the second and third Jordan blocks are the primary blocks associated to (X i) 2 and (X + i) 2. The complete list of possible Jordan elementary divisors for this example is : (a) C 1 : (X 2) 3, (X 2) 2, (X + i) 2, (X i) 2. (b) C 2 : (X 2) 3, (X 2), (X 2), (X + i) 2, (X i) 2. (c) C 3 : (X 2) 3, (X 2), (X + i) 2, (X + i), (X i) 2. (d) C 4 : (X 2) 3, (X 2), (X + i) 2, (X i) 2, (X i). (e) C 5 : (X 2) 3, (X + i) 2, (X + i) 2, (X i) 2. (f) C 6 : (X 2) 3, (X + i) 2, (X + i), (X + i), (X i) 2.
CANONICAL FORMS FOR LINEAR TRANSFORMATIONS AND MATRICES 17 (g) C 7 : (X 2) 3, (X + i) 2, (X + i), (X i) 2, (X i). (h) C 8 : (X 2) 3, (X + i) 2, (X i) 2, (X i) 2. (i) C 9 : (X 2) 3, (X + i) 2, (X i) 2, (X i), (X i). Thus, for example, 2 0 0 0 0 0 0 0 0 1 2 0 0 0 0 0 0 0 0 1 2 0 0 0 0 0 0 0 0 0 i 0 0 0 0 0 C 5 = 0 0 0 1 i 0 0 0 0. 0 0 0 0 0 i 0 0 0 0 0 0 0 0 1 i 0 0 0 0 0 0 0 0 0 i 0 0 0 0 0 0 0 0 1 i The last issue I want to discuss concerns uniqueness of the canonical forms. As before, let q(x) = q 1 (X) e1 q k (X) e k, q i (X) irreducible, be the minimal polynomial of T. Suppose that V and V are bases such that A and A, the matrices of T with respect to V and V, are in the rational canonical form given by Theorem L. We wish to show that as long as we fix the order of the irreducible factors of q(x) (equivalently, fix the order of the primary blocks associated to T ), A = A. Otherwise, A can be obtained from A by permuting its primary blocks. After all, the primary blocks are defined to be matrices of T restricted to ker(q i (T ) ei ), for all 1 i k, and are therefore uniquely determined by the factors q i (X) ei. So, we fix the given order of factors in q(x). Clearly, within any primary block, the structure of A and A are uniquely determined by the corresponding invariant factors. Thus, A = A if we prove the following Theorem. Theorem Q. Let W be a finite dimensional vector space with linear operator T. Let q(x) e be the minimal polynomial of T. Assume that q(x) is irreducible and deg(q(x) e ) = r. Suppose that W 1,..., W t are T -invariant subspaces satisfying the conclusions of Theorem J. For each 1 i t, let W i denote the corresponding basis and q(x) ei the corresponding elementary divisor (so e = e 1 e t ). Suppose, in addition, that U 1,..., U h are T - invariant subspaces satisfying the conclusions of Theorem G and that for each 1 j h,
18 D. KATZ U j := {u j, T (u j ),..., T sj 1 (u j )} is the corresponding basis and q(x) fj is the corresponding elementary divisor (so e = f 1 f h ). Then t = h and e 1 = f 1,..., e t = f t. Proof. Let m be the smallest positive integer such that e m f m. Without loss of generality, suppose e m > f m. We now make the following claim : For all i m, dim(q(t ) fm (U i )) = s i s m and dim(q(t ) fm (W i )) = r i r m. Suppose the claim holds. Since q(t ) fm (U i ) = 0 for all i m, it follows that q(t ) fm (V ) = q(t ) fm (U 1 ) q(t ) fm (U m 1 ), so dim(q(t ) fm (V )) = (s 1 s m ) + + (s m 1 s m ). Similarly, q(t ) fm (V ) = q(t ) fm (V 1 ) q(t ) fm (V m 1 ) q(t ) fm (V m ), so dim(q(t ) fm (V )) = (r 1 s m ) + + (r m 1 s m ) + (r m s m ). Therefore, (s 1 s m ) + + (s m 1 s m ) = (r 1 s m ) + + (r m 1 s m ) + (r m s m ). But, s j = f j deg(q(x)) and r i = e i deg(q(x)), for all i and j. By our choice of m, we get r m s m = 0, a contradiction. Thus, e i = f i for all i and t = h. To verify the claim, it clearly suffices to prove the following statement. Let W be a T -invariant subspace with basis {w, T (w),..., T r 1 (w)}. Suppose that q(x) a (q(x) irreducible) is the minimal polynomial of T W and deg(q(x) a ) = r. Suppose b a and deg(q(x) b ) = d. Then dim(q(t ) b (W )) = r d. To see this, set W := q(t ) b (W ) and w := q(t ) b (w). We will show that W := {w, T (w ),..., T r d 1 (w )} is a basis for W. Now, q(t ) a b (w ) = q(t ) a (w) = 0. Since deg(q(x) a b ) = d r, T j (w ) can be expressed in terms of the vectors in W for all j d r, so W spans W. This also shows that W is T -invariant, so the minimal polynomial of T W must divide q(x) a b. Let f(x) denote the minimal polynomial of T W. Then, 0 = f(t )(w ) = f(t )q(t ) b (w), so f(t W )q(t W ) b = 0. Thus, q(x) a divides f(x)q(x) b, so q(x) a b divides f(x). Therefore, f(x) = q(x) a b and it follows that W is linearly independent. Hence the claim has been verified and the proof of Theorem N is now complete.
CANONICAL FORMS FOR LINEAR TRANSFORMATIONS AND MATRICES 19 It follows from the previous theorem that if the minimal polynomial of T has its roots in F, then the Jordan canonical form for T is unique. As before, it suffices to show that the the form taken by any primary block is unique. But the ith primary block is determined by its Jordan elementary divisors, which in turn are the elementary divisors in the rational canonical form of the nilpotent transformation T λ i. These latter elementary divisors are unique, by Theorem Q. Department of Mathematics, The University of Kansas, Lawrence, KS, 66046