(VI.C) Rational Canonical Form Let s agree to call a transformation T : F n F n semisimple if there is a basis B = { v,..., v n } such that T v = λ v, T v 2 = λ 2 v 2,..., T v n = λ n for some scalars λ i F. T is completely transparent when looked at relative to this basis. Interms of matrices, if A = [T]ê and D = [T] B = diag{λ,..., λ n }, then A = S D S, and A is similar to a diagonal matrix. What if A is not diagonalizable ( T is not semisimple)? Is there some basis in terms of which the action of T is still transparent? An equivalent question: is there a canonical [=standard] sort of matrix to which any A is similar? In the next three sections, we present two distinct solutions to this problem: the rational and Jordan canonical f orms of A. That is, any A is similar to (essentially) unique matrices R and J obeying certain rules. One warning: in case A is diagonalizable, i.e. A D, the rational form R does not reduce to that (or any) diagonal matrix. Later we will find that, by contrast, J does = D in this case. Another basic question: when is A diagonalizable? Momentarily assuming F = C, Answer : when the geometric multiplicities (of the eigenvalues of A ) equal the algebraic multiplicities. Answer 2 : when m A (not f A ) has no repeated roots (see Exercise VI.B.3). Although this avoids the computation involved in Answer (with which you are by now familiar), finding n f (λi A) ain t easy
2 (VI.C) RATIONAL CANONICAL FORM either. Answer 3 : the Spectral Theorem, to be discussed later in these notes. Now we go over a couple of basic building blocks we ll need for our discussion of rational canonical form. Direct-Sum Decomposition. The notation means that V = V V m (i) V = V +... + V m (ii) dim V i = dim V, or equivalently (ii ) there is only one way of writing a given v V as v +... + v m (where v i V i ). We say that V is the direct sum of the V i. If this holds then there exists a basis B = {B,..., B m } for V such that B i are (respectively) bases for V i. A linear transformation T : V V is respects the direct sum decomposition if T(V i ) V i (for each i ); that is, if each V i is closed under the action of T. In that case, [ ] T V B 0 [T] B =... 0 [T Vm ] Bm that is, the matrix of T (with respect to B ) consists of blocks along the diagonal, representing its behavior on each subspace V i. Cyclic Subspaces and Vectors. Let T : V V be a transformation, and look at the successive images (under T ) of v V : v, T v, T 2 v,.... Now V is finite-dimensional, so the collections { v, T v,..., T i v} have to stop being independent at some point: i.e., there is some d such that v, T v,..., T d v are independent, smaller matrices: evidently the dimensions of the i th block are dim V i dim V i.
(VI.C) RATIONAL CANONICAL FORM 3 but and so v, T v,..., T d v, T d v are dependent 0 = α d T d v + α d T d v +... + α T v + α 0 v, where not all α i = 0. In fact α d = 0, because otherwise we would contradict independence of v..., T d v. Dividing by α d, we see that where p is the monic polynomial 0 = p(t) v p(λ) = λ d + α d α d λ d +... + α 0 α d =: λ d + a d λ d +... + a 0. Let W = span{ v, T v,..., T d v}; it is called a (T -)cyclic subspace of V with cyclic vector v, and we say that v generates W (under the action of T ). REMARK. Clearly T d v W since p(t) v = 0 = T d v = a d T d v... a T v a 0 v. In fact W contains T d+ v, T d+2 v,... simply by applying T to both sides of this equation repeatedly. So if we cut off at the degree (= d ) of the first polynomial relation amongst the T k v and consider the span (= W ) of the first d of these, we have already the space spanned by all the T k v. Let s look at the matrix of T W with respect to the basis B = { v, T v,..., T d v} =: { v,..., v d }. Clearly T v = v 2, T v 2 = v 3,..., T v d = v d, while T v d = T(T d v) = T d v = a d T d v a d 2 T d 2 v... a 0 v = a d v d a d 2 v d... a 0 v.
4 (VI.C) RATIONAL CANONICAL FORM Therefore [T W ] B = 0 0 a 0 0 a.... =: M(p) ; 0 a d this is called the companion matrix of the (monic) polynomial p(λ) = λ d + a d λ d +... + a 0. CLAIM 2. p(λ) is both the minimal and characteristic polynomial of its companion matrix. We shall prove this by computing n f (λi M(p)) : starting with λ 0 a 0 λ a....... λi M(p) =.,.. λ ad 3 λ a d 2 0 λ + a d multiply the bottom row by λ and add to the 2 nd from bottom, to obtain λ 0 a 0 λ a.......... λ ad 3 0 λ 2 + a d λ + a d 2 0 λ + a d ;
(VI.C) RATIONAL CANONICAL FORM 5 repeat all the way up (multiply (i + ) st row by λ, add to i th ) to get 0 0 λ d + a d λ d +... + a 0 0 λ d + a d λ d +... + a........... 0 λ 3 + a d λ 2 + a d 2 λ + a d 3 0 λ 2 + a d λ + a d 2 0 λ + a d Noting that the top-right antry is p(λ), add multiples of the st thru (d ) st columns to the right-hand column to kill all the other entries (in that column), obtaining 0 0 p(λ) 0 0..... 0 0 Finally, multiplying the first (d ) columns by ( ) and performing row swaps, we arrive at 0... = n f (λi M(p)). 0 p(λ) Thus m M(p) [= lower-right-hand entry]= p(λ) and f M(p) (λ) [= product of diagonal entries]= p(λ), proving the above Claim. REMARK 3. Since p(λ) is the minimal polynomial of T W, p(t W ) = 0 ; in other words, p(t) annihilates W, which is also clear from the fact that it annihilated W s cyclic vector v above. Rational Canonical Form. For simplicity let V = F n, A M n (F) be any matrix, and T : V V be the corresponding transformation (with A = [T]ê as usual). Recall that the diagonal entries of n f (λi A) give the invariant factors k (λ) of λi A, and let { r,..., n } be the nontrivial ( = ) ones among them. We now
6 (VI.C) RATIONAL CANONICAL FORM show that to each (nontrivial) k there is associated a T -cyclic subspace W k V having k (λ) as both characteristic and minimal polynomial, and moreover that V = W r W n. The rational canonical form is then just the reflection of this decomposition of V (into cyclic subspaces) in matrix form. In VI.B we showed how to transform [for any A M n (F) ] (λi A) M n (F[λ]) by row/column-operations into a matrix in normal form: () P(λI A)Q = n f (λi A) = diag{,...,, r (λ),..., n (λ)} where P, Q M n (F[λ]) are invertible, and deg k (λ) = n. Set d k := deg k (λ). By swapping rows and columns one may change the right-hand side of () to diag{,...,, r (λ) }{{} ;... ;,...,, n(λ) }{{} }. d r +... + d n = n Reversing the row/column operations we did on the companion matrices, we may change this to a matrix diag{λi dr M( r ),..., λi dn M( n )} consisting of blocks down the diagonal (of dimensions d r d r,..., d n d n ). So we have P (λi A) Q = λi n diag{m( r ),..., M( n )} where P, Q M n (F[λ]) are still invertible. Now suppose we could replace the left-hand side by S (λi A)S for S M n (F) with scalar entries. Here is what that would tell us. Let B = { v,..., v n } be the basis given by the columns of S. Then since S (λi A)S = λi S AS, λi n S AS = λi n diag{m( r ),... M( n )} = S AS = diag{m( r ),..., M( n )},
or (VI.C) RATIONAL CANONICAL FORM 7 [T] B = M( r )... M( n ). Combined with what we have seen about cyclic subspaces and direct sums, this tells us that V decomposes into a direct sum of T -cyclic subspaces as described. Namely, if the columns of our proposed S yield B = { v,..., v dr ; v dr +,..., v dr +d r+ ;... ; v n dn +,..., v n } =: {B r, B r+,..., B n }, then define W k = span{b k }, and V = W r W n. From the supposed form of [T] B, it is clear that: k (T) annihilates the subspace W k ; v, v dr +,..., v n dr + are cyclic vectors for W, W r+,..., W n ; and B r = { v,..., v dr } = { v T v,..., T d v } (etc. for the other B k ). This is all not very important since we shall soon present the basis differently. After all, we still have to prove it exists. Finding the Basis. We now embark on the quest to find the basis B so that [T] B looks as above. We ll need to work with a space of vectors with polynomial coefficients, namely (F[λ]) n. (This is really not a vector space, but a module over the ring F[λ].) There is a map (F[λ]) n η F n given by writing out a vector in components and formally replacing λ by T: 2 c = g (λ)ê +... + g n (λ)ê n η( c) := g (T)ê +... + g n (T)ê n. 2 here gi (T)ê i is just a polynomial in the transformation T [or matrix A ] acting on the i th standard basis vector F n.
8 (VI.C) RATIONAL CANONICAL FORM Recall that the row operations 3 P = P M P used to put (λi A) in normal form were invertible, so we may consider 4 P = P PM. Let C, Γ be the bases { c,..., c n }, { γ,..., γ n } for (F[λ]) n consisting of the column vectors of P and Q, respectively. We write SC = C [I]ê = P, S Γ = ê[i] Γ = Q, and also [λi T]ê for λi n A ; now () becomes (2)... r (λ)... n (λ) = S C ([λi T] ê)s Γ = C [λi T] Γ. We are going to use (2) to show that the vectors η( c r ),..., η( c n ) generate V(= F n ) under the action of T. To this end let W k = space generated by η( c k ) under the action of T, and d k = deg k (λ) (do not jump to the conclusion that d k = dim W k ). Noting that c k (resp. γ k ) are the columns of S C (resp. S Γ ), denote the entries of S C, S Γ by C ij, Γ ij F[λ], so that γ j = Γ j ê +... + Γ nj ê n, c j = C j ê +... + C nj ê n. Now observe that according to ( ), λi T takes γ c,..., γ r c r ; γ r r (λ) c r,..., γ n n (λ) c n. We expand this as follows for all j =,..., r, k = r,..., n : (λi T) ( Γ j (λ)ê +... + Γ nj (λ)ê n ) = Cj (λ)ê +... + C nj (λ)ê n, 3 we momentarily ignore the column operations Q. 4 finding P means taking the product of the elementary matrices corresponding to the reverse of the operations you did only the row operations!
(VI.C) RATIONAL CANONICAL FORM 9 (λi T) (Γ k (λ)ê +... + Γ nk (λ)ê n ) = k (λ)c k (λ)ê +... + k (λ)c nk (λ)ê n. One can employ a formal argument 5 like that in our proof of Cayley- Hamilton, to show it is valid to substitute in T for λ to get (since TI T = 0 ) 0 = C j (T)ê +... + C nj (T)ê n = η( c j ), j =,..., r, 0 = k (T) (C k (T)ê +... + C nk (T)ê n ) = k (T) η( c k ), k = r,..., n. (Basically, this amounts to taking η of both sides of both equations.) So we may discard the first r columns of S C = P, since their images under η are zero. for the remining c k, k (T) annihilates η( c k ) 5 unfortunately fairly nasty, unless you use theory of modules (which makes such things automatic). The formal argument (without module theory) for is: since Tê l = i A il ê i, (λi T) η( c j ) = 0, j r ( i ) Γ ij (λ)ê i = C ij (λ)ê i = ( λγij (λ) C ij (λ) ) ê i = Γ lj (λ)tê l = Γ lj (λ)a il ê i = i l i,l for each i, j ( j r ) the polynomial equation λγ ij (λ) C ij (λ) = A il Γ lj (λ) l holds. That is, the two sides are the same polynomial. It now makes perfect sense to substitute in T for λ to obtain TΓ ij (T) C ij (T) = A il Γ lj (λ) = l C ij (T) = (Tδ il A il ) Γ lj (T). l Therefore η( c j ) := i C ij (T)ê i = i,l [ (Tδ il A il ) Γ lj (T)ê i = Γ lj (T) l because i (Tδ il A il )ê i = Tê l i A il ê i = 0. i ] (Tδ il A il ) ê i i = 0
0 (VI.C) RATIONAL CANONICAL FORM so that { } η( c k ), Tη( c k ),..., T d kη( c k ) are dependent. We do not yet know that { } η( c k ), Tη( c k ),..., T dk η( c k ) are independent, but we do know that W k = their span and so Since deg k = n, dim W k d k = deg k (λ). dim W k n. Moreover, k (T) annihilates W k (since it annihilates η( c k ) ). Now let R ij be the (polynomial) entries of P = SC = C [I]ê, so that for all i =,..., n ê i = R i (λ) c +... + R ni (λ) c n. Applying η to both sides, we have ê i = R i (T) η( c ) +... + R ni (T) η( c n ) = R ri (T) η( c r ) +... + R ni (T) η( c n ) since the first (r ) {η( c j )} are zero. Therefore (sums of) powers of T acting on the {η( c r ),..., η( c n )} give all the {ê,..., ê n }, and thereby span all of V = F n! In other words, and so Combining our inequalities, W r +... + W n = V, dim W k dim V = n. (3) dim W k = n and so the sum is direct: V = W r W n.
(VI.C) RATIONAL CANONICAL FORM Also (3) forces dim W k = deg k (λ), so that { } B k = η( c k ), Tη( c k ),..., T dk η( c k ) is a basis for W k (it was an independent set after all). Thus [ T Wk ] B k = M( k ) as promised, and is the basis for V = F n. B = {B r,..., B n } Let s sum up, in terms of a theorem about transformations, and an algorithm for the corresponding matrices: THEOREM 4. For T : V V (dim V = n ) there is a decomposition V = W r W n into T -cyclic subspaces, such that the minimal and characteristic polynomials of T Wk are both just the k th invariant factor k of λi T. Algorithm. Given A M n (F), apply the normal form procedure to λi A to obtain 0... P(λI A)Q =. r (λ)... 0 n (λ) Find the inverse P of the row operations, call its entries C ij (some will be polynomials in λ ) and columns c k. Throw out c,..., c r ; write each of the remaining people component-by-component and compute c k = C k (λ)ê +... + C nk (λ)ê n, η( c k ) = C k (T)ê +... + C nk (T)ê n :
2 (VI.C) RATIONAL CANONICAL FORM after applying the T s to the ê i s rewrite the result as a column vector. (This sounds hard, but in actual practice, is not: e.g., if c k happens only to have scalar entries, then η( c k ) = c k.) Let d k = deg k, and write (for k = r,..., n ) { } B k = η( c k ), Tη( c k ),..., T dk η( c k ) ; put these together to form a basis Set and write where S := S B = B = {B r,..., B n } for F n. { matrix whose columns are the vectors of B A = S R S, R = diag{m( r ),..., M( n )} consists of companion-matrix blocks along the diagonal. (See the boxed formula earlier in the lecture.) }, REMARK 5. You should of course think of this as a change of basis: (A =)[T]ê = ê[i] B [T] B B[I]ê, where T is transparent relative to B. EXAMPLE 6. We now put our favorite example A = into rational canonical form. At the end of the last lecture we put (for this A ) λi A into normal form, n f (λi A) = λ λ 2 3λ,
in 4 (somewhat condensed) steps. (VI.C) RATIONAL CANONICAL FORM 3 We list the row operations involved in each of these steps, numbering them in the order in which they occur: Step I : R = swapping rows and 2 Step II : (Involves three row operations.) R 2 = multiplying row by ( ), R 3 = adding ( λ) {row } to row 2, R 4 = adding row to row 3 Step III : R 5 = adding row 2 to row 3 Step IV : R 6 = multiplying row 2 by ( ). EXAMPLE. What are the inverses of these row operations? This is not a hard question! For example, R 5 = subtracting row 2 from row 3, while R 3 = adding (λ ) {row } to row 2. Things like R and R 6 are the same in reverse. To find C = P one simply writes out the corresponding product of matrices = = C = R R 2 R 3 R 4 R 5 R 6 λ λ 0 λ 0 0 = c =, c 2 = 0, c 3 = Now according to the entries of n f (λi A), the invariant factors of A are =, 2 = λ, 3 = λ 2 3λ. This means that we keep c 2 = η( c 2 ) and c 3 = η( c 3 ), which will generate R 3 under the action of A (recall that η( c) = c if c has only scalar entries); and we throw out c because = = η( c ) = 0. Let s check this, in order to see a computation of η( c) where c has actual polynomial entries! First break c into components with respect to the standard basis: Now you substitute T for λ : c = (λ )ê + ( )ê 2 + ( )ê 3. η( c ) = (T )ê + ( )ê 2 + ( )ê 3 0 0.
4 (VI.C) RATIONAL CANONICAL FORM = Tê (ê + ê 2 + ê 3 ). Finally, rewrite everything in matrix-vector notation (with respect to ê ) to compute the result: 0 0 = 0 0 + + 0 = = 0. 0 0 0 Written in this way it looks kind of like magic. Now the {dimension of W k }= {number of vectors in B k } should be the degree d k of k, in the remaining cases k = 2, 3. If d 2 was 5 (which by the way would never happen) then we would write down B 2 = {η( c 2 ), Tη( c 2 ),..., T 4 η( c 2 )}. In the case at hand we have d 2 =, d 3 = 2. So 0 B 2 = {η( c 2 )} = 0, B 3 = {η( c 3 ), Tη( c 3 )} = 0,. These three vectors together give our basis and 0 rref 0 S = 0 0 S = 2. 0 0 Now for the companion matrices. For arbitrary monic polynomials λ + a 0 and λ 2 + a λ + a 0 of degrees and 2, these are ( ) 0 a 0 ( a 0 ) and. a Therefore in our case M( 2 ) = (0) and M( 3 ) = ( 0 0 3 ). The decomposition is therefore (in the form A = SRS ) 0 0 0 0 0 = 0 0 0 0 0 2. 0 3 0 0
Exercises () Compute the rational canonical form for A = 2 0. 6 3 4 EXERCISES 5 That is, present it as SRS, where R consists of companion matrix blocks along the diagonal and S is a change-of-basis matrix. Interpret the result in terms of a direct-sum decomposition of R 3 under the action of the transformation T with matrix [T]ê = A. (2) Same problem with 0 0 0 0 0 0 A = 2 2 0 2 0 2, and R 4 instead of R 3. [Note: from Exercise VI.B.2, you already know the normal form for λi 4 A.] (3) Once more, same problem with 0 0 0 c 0 0 A = 0 c 0. How does it depend on c? 0 0 c (4) Let T : V V (with V = R n ) be semisimple. (a) If T has a cyclic vector, show that T has n distinct eigenvalues. (b) In this case, if { v,..., v n } is an eigenbasis, show that v = n i= v i is a cyclic vector for T. (5) Let A M n (R) be such that A 2 = I n. (a) Prove that n is even. (b) Writing n = 2k, show that A is similar (over R) to ( ) 0 I J n := k. I k 0
6 (VI.C) RATIONAL CANONICAL FORM (6) Let T : P 3 P 3 be d dx Write its matrix A in the standard basis {x 3, x 2, x, } = B, and put it in rational canonical form (i.e. A = SRS ), without computation! [Hint: what is T 4?] (7) Suppose I tell you the characteristic polynomial of a 5 5 matrix A, without giving you the matrix: f A (λ) = (λ 3) 3 (λ 2) 2. (a)what are the possible rational canonical forms R? (Start by listing the possibilities for n f (λi A).) There should be six. (b) This says that there are six distinct (=nonsimilar) transformations of R 5 with eigenvalue list 3, 3, 3, 2, 2 (repeated according to multiplicity). Which ones are diagonalizable? [Hint: look at Answer 2 at the beginning of the section!]