Mathematics 369 The Jordan Canonical Form A. Hulpke While SCHU Rs theorem permits us to transform a matrix into upper triangular form, we can in fact do even better, if we don t insist on an orthogonal transformation matrix. Our setup is that we have V = C n and a matrix A C n n that represents a linear transformation L:V V via the standard basis S of V : A = S [L] S. We first want to show that it is possible to modify the basis, such that we get a block diagonal matrix with blocks for each eigenvalue: Theorem 1 ( special version of MASCHKE s theorem ) Given L:V V, there is a basis B of V such that A 1 0 0 B [L] B = 0 A 2 0..... 0 0 A l is in block form with each A i in upper triangular form with the same value on the diagonal. Proof: Let us assume that by SCHUR s theorem we have a basis F = {f 1,...,f n } for L which is chosen in a way such that the diagonal entries have the eigenvalues clustered together. Suppose that f a,...,f b are the basis vectors that correspond to the diagonal entry λ and that f c (c > b) is a basis vector corresponding to a different diagonal eigenvalue µ. Suppose that L(f c ) = c d i f i with d c = µ. We are interested in the coefficients d i for a i b. Our aim is to change the basis such that d i = 0 for every a i b. If this is not already the case, there is a largest j, such that d j 0. We set v = f c and note that L(f j ) = λf j + Then j 1 L(v) = L(f c ) d j λ µ L(f j) = = = e i f i because F [L] F is upper triangular. j 1( d i d ) j λ µ e i j 1( d i d ) j λ µ e i c d i f i ( d j λ µ ) ( f i + d j d j λ µ λ }{{} = d jµ λ µ f i + c 1 i= j+1 λf j + f j + j 1 c 1 i= j+1 e i f i ) ( d i f i + µ f c d ) j λ µ f j }{{} = v d i f i + µf c d j λ µ f j Thus we can replace f c by v and obtain a new basis ˆF such that ˆF [L] ˆF is upper triangular but that in this basis L( fˆ c ) does not involve f j.
If we compare F [L] F and ˆF [L] ˆF, the only difference in the c-th column is that the entry in the j-th row has been set to 0. Furthermore (as the images of the basis vectors at positions < c do not involve f c, all prior columns of the matrix are not affected. We can repeat this process for all entries to clean out the whole column for entries at position i b and thus clean out the top of the c-th column. We then iterate to the next (c+1-th) column. We can clean out the to entries of this column without disturbing any previous columns. By processing over all columns in this way, we can build a basis B that achieves the claimed block structure for B [L] B. The Jordan Canonical form (CAMILLE JORDAN, 1838-1922, http://turnbull.dcs.st-and.ac.uk/ history/ Mathematicians/Jordan.html) Let us assume now that be have found (for example by following theorem 1) a basis B such that B [L] B is block diagonal such that every block is upper triangular with one eigenvalue λ i on the diagonal. If the block starts at position k and goes to position l, we say that the basis vectors b k up to b l belong to this block. Our next aim is to describe the subspace spanned by these basis vectors in terms of the transformation L: Because the characteristic polynomial of L is i (x λ i ) a λ i, we have that the block for an eigenvalue λ has exactly size l k + 1 = a λ. We now consider only one such eigenvalue which we denote simply by λ: Then note that b k must be an eigenvector of L for eigenvalue λ, thus b k ker(l λid). Similarly L(b k+1 ) = λb k+1 + a k,k+1 b k and thus b k+1 ker(l λid) 2. By the same argument b k+j ker(l λid) j 1. In particular Span(b k,...,b l ) = ker(l λid) a λ. Definition the generalized eigenspace of L for eigenvalue λ is ker(l λid) a λ. We thus could recover the space (and thus the block structure given by theorem 1) by calculating the kernel of (L λid) a λ. Furthermore this indicates the sequence of subspaces ker(l λid) ker(l λid) 2 ker(l λid) a λ is of interest. To simplify notation we set M = L λid and K i = ker(m i ). We note that K i is M-invariant and thus L-invariant. Our aim is to use this kernel sequence to find a nice bases for the generalized eigenspace K a λ. By concatenating the obtained nice bases for all generalized eigenspaces we will obtain a nice basis for V. As K i K i+1 we see that the dimensions of the K i cannot decrease. Because all dimensions are finite this sequence of dimensions has to become stationary (at a λ ) for some i. This situation is reached once dimk i = dimk i+1, because then K i = K i+1 and K i+2 = { v M i+2 (v) = 0 } = { v M(v) kerm i+1 = K i+1 = K i} = { v M i+1 (v) = 0 } = K i+1 = K i and so forth. The following lemma, which is the core of the argument, shows that we can choose bases for K i in a particular nice way.
Lemma Suppose that for j >= 1 a basis for K j 1 is extended by the vectors v 1,...,v k K j to form a basis of K j. Then: a) M j 1 (v 1 ),...,M j 1 (v k ) K 1 are linearly independent. b) The vectors are linearly independent. Proof: a) Suppose that 0 = k M j 1 (v 1 ),...,M j 1 (v k ), M j 2 (v 1 ),...,M j 2 (v k ),. M(v 1 ),...,M(v k ), v 1,...,v k c i M j 1 (v i ) = M j 1 ( k c i v i ). Then k c i v i kerm j 1 = K j 1. This would contradict that the v i extend a basis of K j 1 to a basis of K j, unless c i = 0 for all i. b) We assume the result of a) and use induction over j. If j = 1 we have that M j 1 = id and the statement is trivial. Thus assume now that j > 1. If ( ) 0 = c j 1,1 M j 1 (v 1 ) + + c j 1,k M j 1 (v k ) + c j 2,1 M j 2 (v 1 ) + + c j 2,k M j 2 (v k ) +. + c 1,1 M(v 1 ) + + c 1,k M(v k ) + c 0,1 v 1 + + c 0,k v k with coefficients c i,i F, we apply M and obtain (using that M j (v i ) = 0 as v i K j ) that 0 = c j 2,1 M j 2 (M(v 1 )) + + c j 2,k M j 2 (M(v k )) +. + c 1,1 M(M(v 1 )) + + c 1,k M(M(v k )) + c 0,1 M(v 1 ) + + c 0,k M(v k ) We now apply induction to the vectors M(v 1 ),...,M(v k ) K j 1 and deduce that c i,i = 0 for i j 2. Thus ( ) simplifies to 0 = c j 1,1 M j 1 (v 1 ) + + c j 1,k M j 1 (v k ) and by the assumption of linear independence of these vectors we get that c i,i = 0 for all i,i which proves the claimed linear independence. Let v be one of the vectors v i from this lemma. Then the set C = {...,M 2 (v),m(v),v} (this is finite because M i (v) = 0 for large enough i) must be linearly independent. The subspace
W = Span(C) is clearly M-invariant (it is M-cyclic) and thus is L-invariant. We calculate that λ 1 0 0 ] 0 λ 1 0. [L = C /W C 0........... 0 0 λ 1 0 0 λ We call a matrix of this form an Jordan block. We got W from one single vector v. If we consider all the vectors v i given in the preceeding lemma we get a series of subspaces, whose bases we can concatenate to obtain a larger linearly independent set. If we consider them in the sequence {...,M 2 (v 1 ),M(v 1 ),v 1,...,M 2 (v 2 ),M(v 1 ),v 2,...,M 2 (v k ),M(v k ),v k }, ] they span an L-invariant subspace U such that [L /U is block diagonal with Jordan blocks along the diagonal. The last remaining question is whether this subspace U is equal to K a λ. This is not necessarily the case, in fact we might have that K i U if dimk i dimk i 1 > k. In such a case we can extend the basis by picking further vectors in K i, which are not in K i 1. These vectors (and their images) again generate L-invariant subspaces, and yield further, smaller, Jordan blocks. If we continue this process to obtain a basis for K a λ for every eigenvalue λ we thus get the following statement: Theorem There exists a basis B such that B [L] B is block diagonal with Jordan blocks along the diagonal. The total number of occurrences of λ on the diagonal is equal to a λ. the number of Jordan blocks of size i is equal to dimk i dimk i 1. If L is given by a matrix A, the matrix (similar to A) described in this theorem is called the Jordan Canonical form of A. (See below for uniqueness.) Computing the Jordan Canonical Form Let us first look at the case of a concrete matrix A. We want to find such a basis B and the Jordan Canonical form of A. This process essentially involves finding suitable vectors in the kernels K i and arranging them in the right way. The only issue is bookkeeping of the vectors to decide whether new basis vectors are just obtained as images, or have to be chosen from a suitable kernel to start a new Jordan block. (1) Determine the Eigenvalues λ k of the matrix A (for example as roots of the characteristic polynomial). For each eigenvalue λ perform the following calculation (which gives a basis of the generalized eigenspace of λ, the whole basis will be obtained by concatenating the bases obtained for the different λ k ). Again we write K i = ker(a λ I) i.
(2) Calculate e i = dimk i until the sequence becomes stationary. (The largest e i is the dimension of the generalized eigenspace.) (3) Let f i = e i e i 1. Then e i gives the number of Jordan blocks that have size at least i. (As long as we only want to know the Jordan form, we thus could stop here.) We now build a basis in sequence of descending i. Let B = [ ] and i = max{i f i > 0}. (4) (Continue growing the existing Jordan blocks) For each vector list (s 1,...,s m ) in B, append the image (A λ I) s m of its last element to the list. (5) (Start new Jordan block of size i) If f i f i 1 = m > 0 (then the images of the vectors obtained so far do not span K i ) let W = Span(K i 1,s 1,...,s k ) where the s j run through the elements in all the lists obtained so far. Extend a basis of W to a basis of K i by adding m linearly independent vectors b 1,...b m in K i K i 1 to it. The probability is high (why?) that any m linear independent basis vectors of K i fulfill this property. To verify it, choose a basis for K i 1, append the s j and then append the b i. Then show that the resulting list is linearly independent. (The generic method would be to extend a basis of W to a basis of K i and take the vectors by which the basis got extended.) (6) For each such vector b i add a list [b i ] to B. (7) If the number of vectors in the lists in B is smaller than the maximal e i, then decrement i and go to step (4). (8) Concatenate the reverses of the lists in B. This is the part of the basis corresponding to eigenvalue λ. For example, let A := 59 224 511 214 4 16 61 139 58 1 6 24 51 20 0 13 52 110 43 0 4 12 38 20 1 Its characteristic polynomial is (x 1) 4. We get the following nullspace dimensions and their differences: i 0 1 2 3 4 e i = dimk i 0 2 4 5 5 f i = e i+1 e i 2 2 1 0 At this point we know already the shape of the Jordan Canonical form of A (2 blocks of size 1 or larger, 2 blocks of size 2 or larger, 1 block of size 3 or larger. I.e. One block of size 2 and one block of size 3), but let us compute the explicit base change: We start at i = 3 and set B = [ ]. We have that K 3 = R 5 and. K 2 = Span(4,1,0,0,0) T,( 15/2,0,1,0,0) T,(3,0,0,1,0) T,(0,0,0,0,1) T,
As f 3 = 1 we need to find only one basis vector and pick b 1 := (1,0,0,0,0) T as first basis vector (an almost random choice, we only have to make sure it is not contained in K 2, which is easy to verify) and add the list [b 1 ] to B. In step i = 2 we first compute the image b 2 := (A 1)b 1 = (58,16,6,13, 4) T and add it to the list. Furthermore, as f 2 > f 3, we have to get another basis vector in K 2, but not in the span of K 3 and b 2. We pick b 3 = (4,1,0,0,0) T from the spanning set of K 2, and verify that it indeed fulfills the conditions. We thus have B = [[b 1,b 2 ],[b 3 ]]. In step i = 1 we now compute images again b 4 := (A 1)b 2 = (48,12,4,10,0) T and (from the second list) b 5 := (A 1)b 3 = (8,2,0,0, 4) T. As f 1 = f 2 no new vectors are added. As a result we get B = [[b 1,b 2,b 4 ],[b 3,b 5 ]]. Finally we concatenate the reversed basis vector lists and get the new basis (b 4,b 2,b 1,b 5,b 3 ). We thus have the base change matrix S := 48 58 1 8 4 12 16 0 2 1 4 6 0 0 0 10 13 0 0 0 0 4 0 4 0 Uniqueness and Similarity.It is easily verified thats 1 AS = 1 1 0 0 0 0 1 1 0 0 0 0 1 0 0 0 0 0 1 1 0 0 0 0 1 If we arrange the bases of different M-cyclic subspaces in a different way, we rearrange the order of Jordan blocks. We therefore make the following convention for the sake of uniqueness: The eigenvalues are arranged in ascending order Within each eigenvalue, Jordan blocks are arranged in descending size. The resulting matrix is called the Jordan Canonical form of A and is denoted by J(A). Lemma: Let A,B F n n. Then A and B are similar if and only if J(A) = J(B). Proof: Since the Jordan Canonical form is obtained form a base change operation we know that A and J(A) are similar. We thus see that if A and B have the same Jordan canonical form, then A and B must be similar. Vice versa suppose that A and B represent the same linear transformation L with respect to different bases. The Jordan canonical form is determined uniquely by the dimensions dimker(l λ id) i (these numbers determine the numbers and sizes of the Jordan blocks). These dimensions are independent of the choice of base. The (a priori very hard) question on whether two matrices are similar can therefore be answered by computing the Jordan Canonical form.