Jordan Normal Form. Chapter Minimal Polynomials

Chapter 8 Jordan Normal Form 81 Minimal Polynomials Recall p A (x) =det(xi A) is called the characteristic polynomial of the matrix A Theorem 811 Let A M n Then there exists a unique monic polynomial q A (x) of minimum degree for which q A (A) =0Ifp(x) is any polynomial such that p(a) =0,thenq A (x) divides p(x) Proof Since there is a polynomial p A (x) forwhichp A (A) = 0, there is one of minimal degree, which we can assume is monic by the Euclidean algorithm p A (x) =q A (x)h(x)+r(x) where deg r(x) < deg q A (x) We know p A (A) =q A (A)h(A)+r(A) Hence r(a) = 0, and by the minimality assumption r(x) 0 Thus q A divided p A (x) and also any polynomial for which p(a) = 0 to establish that q A is unique, suppose q(x) is another monic polynomial of the same degree for which q(a) = 0 Then r(x) =q(x) q A (x) is a polynomial of degree less than q A (x) forwhichr(a) =q(a) q A (A) =0 This cannot be unless r(x) 0=0q = q A Definition 811 The polynomial q A (x) in the theorem above is called the minimal polynomial 221

222 CHAPTER 8 JORDAN NORMAL FORM Corollary 811 If A, B M n are similar, then they have the same minimal polynomial Proof B = S 1 AS q A (B) =q A (S 1 AS) =S 1 q A (A)S = q A (A) =0 If there is a minimal polynomial for B of smaller degree, say q B (x), then q B (A) = 0 by the same argument This contradicts the minimality of q A (x) Now that we have a minimum polynomial for any matrix, can we find a matrix with a given polynomial as its minimum polynomial? Can the degree the polynomial and the size of the matrix match? The answers to both questions are affirmative and presented below in one theorem Theorem 812 For any n th degree polynomial p(x) =x n + a n 1 x n 1 + a n 2 x n 2 + + a 1 x + a 0 there is a matrix A M n (C) for which it is the minimal polynomial Proof Consider the matrix given by 0 0 a 0 1 0 a 1 A = 0 1 0 a2 0 0 0 1 a n 1 Observe that Ie 1 = e 1 = A 0 e 1 Ae 1 = e 2 = Ae 1 Ae 2 = e 3 = A 2 e 1 Ae n 1 = e n = A n 1 e 1

81 MINIMAL POLYNOMIALS 223 and Ae n = a n 1 e n a n 2 e n 1 a 1 e 2 a 0 e 1 Since Ae n = A n e 1, it follows that Also p(a)e 1 = A n e 1 + a n 1 A n 1 e 1 + a n 2 A 2 e 1 + + a 1 Ae 1 + a 0 Ie 1 =0 p(a)e k = p(a)a k 1 e 1 = A k 1 p(a)e 1 = A k 1 (0) = 0 k =2,,n Hence p(a)e j =0 for j =1n Thus p(a) = 0 We know also that p(x) is monic Suppose now that where m<nand q(a) = 0 Then q(x) =x m + b m 1 x m 1 + b 1 x + b 0 q(a)e 1 = A m e 1 + b m 1 A m 1 e 1 + + b 1 Ae 1 + b 0 e 1 = e m+1 + b m 1 e m + + b 1 e 2 + b 0 e 1 =0 But the vectors e m+1 e 1 are linear independent from which we conclude that q(a) = 0 is impossible Thus p(x) isminimal Definition 812 For a given monic polynomial p(x), the matrix A constructed above is called the companion matrix to p The transpose of the companion matrix can also be used to generate a linear differential system which has the same characteristic polynomial as a given n th order differential equation Consider the linear differential equation y (n) + a n 1 y (n 1) + + a 1 y 0 + a 0 =0 This n th order ODE can be converted to a first order system as follows: u 1 = y u 2 = u 0 1 = y 0 u 3 = u 0 2 = y 00 u n = u 0 n 1 = y (n 1)

224 CHAPTER 8 JORDAN NORMAL FORM Then we have 0 u 1 0 1 u 1 u 2 0 1 0 u 2 0 1 = 0 1 u n a 0 a 1 a n 1 82 Invariant subspaces There seems to be no truly simple way to the Jordan normal form The approach taken here is intended to reveal a number of features of a matrix, interesting in their own right In particular, we will construct generalized eigenspaces that envelop the entire connection of a matrix with its eigenvalues We have in various ways considered subspaces V of C n that are invariant under the matrix A M n (C) Recall this means that AV V For example, eigenvectors can be used to create invariant subspaces Null spaces, the eigenspace of the zero eigenvalue, are invariant as well Triangular matrices furnish an easily recognizable sequence of invariant subspaces Assuming T M n (C) is upper triangular, it is easy to see that the subspaces generated by the coordinate vectors {e 1,,e m } for m =1,,nare invariant under T We now consider a specific type of invariant subspace that will lead the so-called Jordan normal form of a matrix, the closest matrix similar to A that resembles a diagonal matrix Definition 821 (Generalized Eigenspace) Let A M n (C)withspectrum σ (A) ={λ 1,,λ k } Define the generalized eigenspace pertaining to λ i by V λi = {x C n (A λ i I) n x =0} Observe that all the eigenvectors pertaining to λ i are contained in V λi If the span of the eigenvectors pertaining to λ i is not equal to V λi then there must be a positive power p and a vector x such that (A λ i I) p x = 0 but that y =(A λ i I) p 1 x 6= 0 Thusy is an eigenvector pertaining to λ i Forthis reason we will call V λi the space of generalized eigenvectors pertaining to λ i Ourfirst result, that V λi is invariant under A, issimpletoprove,noting that only closure under vector addition and scalar multiplication need be established u n

82 INVARIANT SUBSPACES 225 Theorem 821 Let A M n (C) with spectrum σ (A) = {λ 1,,λ k } Then for each i =1,,k, V λi is an invariant subspace of A One might question as to whether V λi could be enlarged by allowing higher powers than n in the definition The negative answer is most simply expressed by evoking the Hamilton-Cayley theorem We write the characteristic polynomial p A (λ) = Q (λ λ i ) m A(λ i ),wherem A (λ i ) is the algebraic multiplicity of λ i Since p A (A) = Q (A λ i I) m A(λ i ) = 0, it is an easy matter to see that we exhaust all of C n with the spaces V λi This is to say that allowing higher powers in the definition will not increase the subspaces V λi Indeed, as we shall see, the power of (λ λ i ) can be decreased to the geometric multiplicity m g (λ i )thepowerforλ i For now the general power n will suffice One very important result, and an essential firststepinderiving the Jordan form, is to establish that any square matrix A is similar to a block diagonal matrix, with each block carrying a single eigenvalue Theorem 822 Let A M n (C) with spectrum σ (A) ={λ 1,,λ k } and with invariant subspaces V λi, i =1, 2,,k Then (i) The spaces V λi, j = 1,,k are mutually linearly independent (ii) L k i=1 V λ i = C n (alternatively C n = S(V λ1,,v λk )) (iii) dim V λi = m A (λ i ) (iv) A is similar to a block diagonal matrix with k blocks A 1,,A k Moreover, σ (A i ) = {λ i } and dim A i = m A (λ i ) Proof (i) It should be clear that the subspaces V λi are linearly independent of each other For if there is a vector x in both V λi and V λj then there is a vector for some integer q, it must be true that (A λ j I) q 1 x 6= 0 but (A λ j I) q x =0 This means that y =(A λ j I) q 1 x is an eigenvector pertaining to λ j Since (A λ i I) n x = 0 we must also have that (A λ j I) q 1 (A λ i I) n x = (A λ i I) n (A λ j I) q 1 x = (A λ i I) n y =0 nx µ n = ( λ i ) n k A k y k k=0 nx µ n = ( λ i ) n k λ k j y k k=0 = (λ j λ i ) n y =0 This is impossible unless λ j = λ i (ii) The key part of the proof is to block diagonalize A with respect to these invariant subspaces To that end, let S

226 CHAPTER 8 JORDAN NORMAL FORM be the matrix with columns generated from bases of the individual V λi taken in the order of the indices Supposing there are more linearly independent vectors in C n other than those already selected, fill out the matrix S with vectors linearly independent to the subspaces V λi,i=1,,k Now define Ã = S 1 AS We conclude by the invariance of the subspaces and their mutual linear independence that Ã has the following block structure A 1 0 0 0 A 2 0 Ã = S 1 AS = A k 0 0 B It follows that p A (λ) =pã (λ) =³Y pai (λ) p B (λ) Any root r of p B (λ) must be an eigenvalue of A, say λ j, and there must be an eigenvector x pertaining to λ j Moreover, due to the block structure we can assume that x =[0,,0,x] T, where there are k blocked zeros of the sizes of the A i respectively Then it is easy to see that ASx = λ j Sx, and this implies that Sx V λj Thus there is another vector in V λj, which contradicts its definition Therefore B is null, or what is the same thing, k i=1 V λ i = C n (iii) Let d i =dimv λi From (ii) know P d i = n Suppose that λ i σ (A j ) Then there is another eigenvector x pertaining to λ i and for which A j x = λ i x Moreover, this vector has the form x =[0,,0,x,0,0] T, analogous to the argument above By construction Sx / V λi, but ASx = λ i Sx, and this contradicts the definition of V λi Wethushavethatp Ai (λ) =(λ λ i ) d i Since p A (λ) = Q p Ai (λ) = Q (λ λ i ) m A(λ i ), it follows that d i = m A (λ i ) (iv) Putting (ii), and (iii) together gives the block diagonal structure as required On account of the mutual linear independence of the invariant subspaces V λi and the fact that they exhaust C n the following corollary is immediate Corollary 821 Let A M n (C) with spectrum σ (A) =λ 1,,λ k and with generalized eigenspaces V λi, i =1, 2,,k Then each x C n has a unique representation x = P k i=1 x i where x i V λi Another interesting result which reveals how the matrix works as a linear transformation is to decompose the it into components with respect to the

82 INVARIANT SUBSPACES 227 generalized eigenspaces In particular, viewing the block diagonal form A 1 0 0 Ã = S 1 0 A 2 0 AS = Ak the space C n can be split into a direct sum of subspaces E 1,,E k based on coordinate blocks This is accomplished in such that any vector y C n can be written uniquely as y = P k i=1 y i where the y i E i (Keep in mind that each y i C n ; its coordinates are zero outside the coordinate block pertaining E i ) Then Ãy = Ã P k i=1 y i = P k i=1 Ãy i = P k i=1 A iy i This provides a computational tool when this block diagonal form is known Note that the blocks correspond directly to the invariant subspaces by SE i = V λi We can use these invariant subspaces to get at the minimal polynomial For each i =1,,k define m i =min j {(A λ i I) j x =0 x V λi } Theorem 823 Let A M n (C) with spectrum σ (A) =λ 1,,λ k and with invariant subspaces V λi, i =1, 2,,k Then the minimal polynomial of A is given by q (λ) = ky (λ λ i ) m i i=1 Proof Certainly we see that for any vector x V λj Ã ky! q (A) x = (A λ i I) m i x =0 i=1 Hence, the minimal polynomial q A (λ) dividesq (A) To see that indeed they are in fact equal, suppose that the minimal polynomial has the form q A (λ) = ky (λ λ i ) ˆm i i=1 where ˆm i m i, for i =1,,kandinparticular ˆm j <m j Byconstruction there must exist a vector x V λj such that (A λ j I) m j x = 0 but y =

228 CHAPTER 8 JORDAN NORMAL FORM (A λ j I) m j 1 x 6= 0 Then if q (A) x = = = 0 Ã ky! (A λ i I) ˆm i x i=1 ky (A λ i I) ˆm i y i=1 i6=j This cannot be because the contrary implies that there is another vector in one of the invariant subspaces V λk Just one more step is needed before the Jordan normal form can be derived For a given V λi we can interpret the spaces in a heirarchical viewpoint We know that V λi contains all the eigenvectors pertaining to λ i Call these eigenvectors the first order generalized eigenvectors If the span of these is not equal to V λi, then there must be a vector x V λi for which y = (A λ i I) 2 x = 0 but (A λ i I) x 6= 0 Thatistosayy is an eigenvector of A pertaining to λ i Call such vectors second order generalized eigenvectors In general we call an x V λi a generalized eigenvector of order j if y = (A λ i I) j x = 0 but (A λ i I) j 1 x 6= 0 In light of our previous discussion V λi contains generalized eigenvectors of order up to but not greater than m λi Theorem 824 Let A M n (C) with spectrum σ (A) ={λ 1,,λ k } and with invariant subspaces V λi, i =1, 2,,k (i) Let x V λi be a generalized eigenvector of order p Then the vectors x, (A λ i I) x, (A λ i I) 2 x,, (A λ i I) p 1 x (1) are linearly independent (ii) The subspace of C n generated by the vectors in (1) is an invariant subspace of A Proof (i) To prove linear independence of a set of vectors we suppose linear dependence That is there is a smallest integer k and constants b j such that kx x j = kx b j (A λ i I) j x =0

82 INVARIANT SUBSPACES 229 where b k 6=0 Solving we obtain b k (A λ i I) k x = P k 1 b j (A λ i I) j x Now apply (A λ i I) p k to both sides and obtain a new linearly dependent set as the following calculation shows Xk 1 0 = b k (A λ i I) p k+k x = b j (A λ i I) j+p k x = p 1 X j=p k b j+p k (A λ i I) j x Thekeypointtonotehereisthatthelowerlimitofthesumisincreased This new linearly dependent set, which we denote with the notation P p 1 j=p k c j (A λ i I) j x can be split in the same way as before, where we assume with no loss in generality that c p 1 6=0 Then c p 1 (A λ i I) p 1 x = p 2 X j=p k c j (A λ i I) j x Apply (A λ i I) to both sides to get 0 = c p 1 (A λ i I) p x = = p 1 X j=p k+1 p 2 X j=p k c j 1 (A λ i I) j x c j (A λ i I) j+1 x Thus we have obtained another linearly independent set with the lower limit of powers increased by one Continue this process until the linear dependence of (A λ i I) p 1 x and (A λ i I) p 2 x is achieved Thus we have c (A λ i I) p 1 x = d (A λ i I) p 2 x (A λ i I) y = d c y where y =(A λ i I) p 2 x This implies that λ i + d is a new eigenvalue c with eigenvector y V λi, and of course this is a contradiction (ii) The invariance under A is more straightforward First note that while x 1 = x,

230 CHAPTER 8 JORDAN NORMAL FORM x 2 =(A λi) x = Ax λx so that Ax = x 2 λx 1 Consider any vector y defined by y = P p 1 b j (A λ i I) j x It follows that Xp 1 Ay = A b j (A λ i I) j x = = = = p 1 X b j (A λ i I) j Ax p 1 X b j (A λ i I) j (x 2 λx 1 ) p 1 X b j (A λ i I) j [(A λ i I) x 1 λx 1 ] p 1 X c j (A λ i I) j x where c j = b j 1 λ for j>0andc 0 = λ, which proves the result 83 The Jordan Normal Form We need a lemma that points in the direction we are headed, that being the use of invariant subspaces as a basis for the (Jordan) block diagonalization of any matrix These results were discussed in detail in the Section 82 The restatement here illustrates the invariant subspace nature of the result, irrespective of generalized eigenspaces Its proof is elementary and is left to the reader Lemma 831 Let A M n (C) with invariant subspace V C n (i) Suppose v 1 v k is a basis for V and S is an invertible matrix with the first k columns given by v 1 v k Then S 1 AS = 1k 0

83 THE JORDAN NORMAL FORM 231 (ii) Suppose that V 1,V 2 C n are two invariant subspaces of A and C n = V 1 V 2 Let the (invertible) matrix S consist respectively of bases from V 1 and V 2 as its columns Then S 1 AS = 0 0 Definition 831 Let λ C A Jordan block J k (λ) isak k upper triangular matrix of the form λ 1 0 J k (λ) = λ 1 0 1 λ A Jordan matrix is any matrix of the form J n1 (λ 1 ) 0 J = 0 J nk (λ k ) where the matrices J n1 are Jordan blocks If J M n (C), then n 1 + n 2 + n k = n Theorem 831 (Jordan normal form) Let A M n (C) Then there is a nonsingular matrix S M n such that J n1 (λ 1 ) 0 A = S S 1 = SJS 1 0 J nk (λ k ) where J ni (λ i ) is a Jordan block, where n 1 +n 2 + +n k = n J is unique up to permutations of the blocks The eigenvalues λ 1,,λ k are not necessarily distinct If A is real with real eigenvalues, then S can be taken as real Proof This result is proved in four steps (1) Block diagonalize (by similarity) into invariant subspaces pertaining to σ(a) This is accomplished as follows First block diagonalize the matrix according to the generalized eigenspaces V λi = {x C n (A λ i I) n x =

232 CHAPTER 8 JORDAN NORMAL FORM 0} as discussed in the previous section Beginning with the highest order eigenvector in x V λi, construct the invariant subspace as in (1) of Section 82 Repeat this process until all generalized eigenvectors have been included in an invariant subspace This includes of course first order eigenvectors that are not associated with higher order eigenvectors These invariant subspaces have dimension one Each of these invariant subspaces is linearly independent from the others Continue this process for all the generalized eigenspaces This exhausts C n Each of the blocks contains exactly one eigenvector The dimensions of these invariant subspaces can range from one to m λi,therebeingat least one subspace of dimension m λi (2) Triangularize each block by Schur s theorem, so that each block has the form λ K(λ) = 0 λ You will note that K(λ) =λi + N where N is nilpotent, or K(λ) is1 1 (3) Jordanize each triangular block Assume that K 1 (λ)ism m, where m>1 By construction K 1 (λ) pertains to an invariant subspace for which there is a unique vector x for which N m 1 x 6= 0 and N m x =0 Thus N m 1 x is an eigenvector of K 1 (λ), the unique eigenvector Define y i = N i 1 x i =1, 2,,m Expand the set {y i } m i=1 as a basis of C mdefine " # ym y m 1 y 1 S 1 = Then " # 0 ym y m 1 y 2 NS 1 =

83 THE JORDAN NORMAL FORM 233 So S 1 1 NS 1 = 0 1 0 0 1 0 1 0 0 We conclude that λ 1 0 λ 1 S1 1 K λ 1 1(λ)S 1 = 1 0 λ (4) Assemble all of the blocks to form the Jordan form For example, the block K 1 (λ) and the corresponding similarity transformation S 1 studied above can be treated in the assembly process as follows: Define the n n matrix I 0 0 Ŝ 1 = 0 S 1 0 0 0 I where S 1 istheblockconstructedaboveandplacedinthen n matrix in the position that K 1 (λ) was extracted from the block triangular form of A Repeat this for each of the blocks pertaining to minimally invariant subspaces This gives a sequence of block diagonal matrices Ŝ 1, Ŝ2, Ŝk Define T = Ŝ1Ŝ2 Ŝk It has the form T = Ŝ 1 0 Ŝ 2 0 Ŝ k Together with the original matrix P that transformed the matrix to the minimal invariant subspace blocked form and the unitary matrix

234 CHAPTER 8 JORDAN NORMAL FORM V used to triangularize A, it follows that J n1 (λ 1 ) 0 A = PVT (PVT) 1 = SJS 1 0 J nk (λ k ) with S = PVT Example 831 Let J = 2 1 0 2 2 3 1 0 0 3 1 0 0 3 1 In this example, there are four blocks, with two of the blocks pertaining to the single eigenvalue 2 For the first block there is the single eigenvector e 1, but the invariant subspace is S (e 1, e 2 ) For the second block, the eigenvector, e 3, generates the one dimensional invariant subspace The block pertaining to the eigenvector 3 has the single eigenvector e 4 while the minimal invariant subspace is S (e 4, e 5,e 6 ) Finally, the one dimensional subspace pertaining to the eigenvector 1 is spanned by e 7 The minimal invariant polynomial is q (λ) =(λ 2) 2 (λ 3) 3 (λ +1) 84 Convergent matrices Using the Jordan normal form, the study of convergent matrices becomes relatively straightforward and simpler Theorem 841 If A M n and ρ(a) < 1 Then lim k A =0

85 EXERCISES 235 Proof We assume A is a Jordan matrix Each Jordan block J k (λ) canbe written as J k (λ) =λi k + N k where 0 1 0 0 1 N k = 1 0 0 is nilpotent Now J n1 (λ 1 ) A = Jmk (λk) We compute, for m>n k (J nk (λ k )) M =(λi + N) m Xm k µ m = λ m I + λ m j N j j because N j =0forj>n k Wehave µ m λ m j 0asm j since λ < 1 The results follows 85 Exercises 1 Prove Theorem 821 2 Find a 3 3 matrix that has the same eigenvalues are the squares of the roots of the equation λ 3 3λ 2 +4λ 5=0 3 Suppose that A is a square matrix with σ(a) ={3}, m a (3) = 6, and m g (3) = 3 Up to permutations of the blocks show all possible Jordan normal forms for A

236 CHAPTER 8 JORDAN NORMAL FORM 4 Let A M n (C) andletx 1 C n Define x i+1 = Ax i for i =1,,n 1 Show that V = S({x 1,,x n }) is an invariant subspace of A Show that V contains an eigenvector of A 5 Referring to the previous problem, let A M n (R) beapermutation matrix (i) Find starting vectors so that dim V = n (ii) Find starting vectors so that dim V = 1 (iii) Show that if λ = 1 is a simple eigenvalue of A then dim V =1ordimV = n

Chapter 9 Hermitian and Symmetric Matrices Example 901 Let f : D R, D R n TheHessian is defined by H(x) =h ij (x) Since for functions f C 2 it is known that 2 f x i x j = it follows that H(x) is symmetric f x i x j M n 2 f x j x i Definition 901 A function f : R R is convex if f(λx +(1 λ)y) λf(x)+(1 λ)f(y) for x, y D (domain) and 0 λ 1 Proposition 901 If f C 2 (D) and f 00 (x) 0 on D then f(x) is convex Proof Because f 00 0, this implies that f 0 (x) is increasing Therefore if x<x m <ywe must have and f(x m ) f(x)+f 0 (x m )(x m x) f(x m ) f(y)+f 0 (x m )(x m y) 237