Given a finite-dimensional vector space V over a field K, recall that a linear

Jordan normal form Sebastian Ørsted December 16, 217 Abstract In these notes, we expand upon the coverage of linear algebra as presented in Thomsen (216). Namely, we introduce some concepts and results of fundamental importance in the field, but which were not available to us without the greater level of abstraction developed in the current course. Specifically, we introduce concepts like the Jordan normal form and generalized eigenspaces. Given a finite-dimensional vector space V over a field K, recall that a linear operator θ : V V is called diagonalizable if V admits a basis consisting of eigenvectors of θ. Thus each element of this basis lies in some eigenspace E(λ) = {v V θ(v) = λv } corresponding to an eigenvalue λ of θ (but all basis elements need not belong to the same E(λ)!). We note the following alternative formulation of being diagonalizable. It is considered standard knowledge from linear algebra, and we omit the proof. Proposition J.1. If λ 1,λ 2,...,λ r are the distinct eigenvalues of θ, then θ is diagonalizable if and only if V can be written as the direct sum V = E(λ 1 ) E(λ 2 ) E(λ r ) of the eigenspaces of θ. Furthermore, we may choose a basis for each space E(λ i ) and combine them to a basis for V in which θ has the block diagonal form θ = D n1 (λ 1 ) D n2 (λ 2 )... D nr (λ r ) where D ni (λ i ) = λ i id ni is the n i n i diagonal matrix with all diagonal entries equal to λ i. Recall that V being the direct sum of subspaces V 1,V 2,...,V k, means that any v V can be written uniquely as a sum v = v 1 + v 2 + + v n 1,

of vectors with v i V i. In this case we write V = V 1 V 2 V k = k V i. i=1 In general, not all operators are diagonalizable; indeed, many matrices, like ( 1 1 ) over K = R, do not have eigenvalues at all, since its characteristic polynomial x 2 + 1 R[x] has no roots in R. Nevertheless, one of the many consequences of the theory we develop here is that when the field K is, for instance, the complex numbers, all operators θ satisfy a slightly weaker form of Proposition J.1: In the definition of E(λ), we replace the condition that θ(v) = λv (or, equivalently, (θ λid)v = ) by the requirement that (θ λid) n v = for some n 1 (which is allowed to depend on v). In other words, define E(λ) = {v V (θ λid) n v = for some n 1}, the generalized eigenspace of θ with respect to λ. We note that E(λ) is a vector subspace of E(λ) (see Exercise J.1), and that this inclusion can be strict. Note also that we have θ(e(λ)) E(λ) (see Exercise J.2). It would seem natural to extend the well-known terminology further and refer to λ as a generalized eigenvalue if E(λ) ; however, this terminology turns out to be redundant, because Proposition J.2. A generalized eigenvalue is automatically an eigenvalue, hence we simply refer to λ as the corresponding eigenvalue. Proof. If (θ λid) n v = for some v, then = det((θ λid) n ) = (det(θ λid)) n, so det(θ λid) =, hence λ is an eigenvalue. Example J.3. The matrix θ = ( ) 1 1 1 is not diagonalizable over C; the only root of its characteristic polynomial (1 x) 2 is x = 1. If it were diagonalizable, we would get from Proposition J.1 that C 2 = E(1), which would imply θv = v for all v C 2 ; this is absurd since θ is not the identity matrix. However, the difference ( ) θ 1id = 1 satisfies (θ 1id) 2 =, hence (θ 1id) 2 v = for all v C 2. This implies that E(1) = C 2. Example J.4. We have only defined generalized eigenspaces for vector spaces of finite dimension and will only deal with those outside of this example. However, the definition obviously makes sense in the infinite-dimensional setting as well, and we shall give a particularly pretty example. Let V = C (R) be the real vector space of smooth (i.e., infinitely often differentiable) functions R R, and let θ = d/dx : V V be the differential operator. The generalized eigenspace E() consists of all smooth functions f : R R such 2

that d n f /dx n = for some n. Calculating antiderivatives one by one, this means that d n 1 f /dx n 1 is a constant, hence d n 2 f /dx n 2 is a polynomial of degree at most 1, and in general, d n k f /dx n k is a polynomial of degree k 1. In particular, f = d f /dx is a polynomial of degree at most n 1. We conclude that E() = R[x] consists exactly of all polynomial functions on R. This is probably one of the many things that make polynomials interesting for an analyst as well as an algebraist. In order to state our main result, we need one more definition. Recall the result known by the misleading name, the Fundamental Theorem of Algebra ; it is not fundamental to modern abstract algebra, but received its name from an earlier discipline known simply as algebra, which was mainly concerned with finding roots of real and complex polynomials. The abstract in abstract algebra is there for a reason. Theorem J.5. Every non-constant polynomial in C[x] has a root in C. Proof. See, for instance, Exercise 1.54 in Stetkær (212), Theorem 4.23 in Berg (213), or Theorem 1.8 in Hatcher (22). This motivates the following definition: The field K is called algebraically closed if it satisfies the same theorem, that is, if every non-constant polynomial in K[x] has a root in K. Thus C is algebraically closed, while R and Q are not. No finite field is algebraically closed (see Exercise J.3). There are many fields satisfying this definition, but very few can be constructed concretely, and most of our results here will be developed with C in mind. Our main theorem states that for algebraically closed fields, a basis for the vector space V can be chosen in which θ has a representation given in terms of blocks of the form J n (λ) = λ 1 λ 1...... λ 1 λ the n n square matrix with λs along the diagonal and ones immediately above it. This block is known as the Jordan block of order n with respect to λ. Theorem J.6. If the field K is algebraically closed and θ : V V is a linear operator on a finite-dimensional K-vector space, and if λ 1,λ 2,...,λ r are its distinct eigenvalues, then V = E(λ 1 ) E(λ 2 ) E(λ r ) is the direct sum of the generalized eigenspaces. Restricting θ to an operator θ : E(λ i ) E(λ i ) on one of these spaces, we may choose a basis for E(λ i ) in which θ has the block diagonal form B(λ i ) = J n1 (λ i ) J n2 (λ i ) 3..., J nk (λ i )

consisting of Jordan blocks, where n 1 n 2 n k 1 are integers (depending on i). Combining these bases to a basis for all of V, θ thus has the matrix representation B(λ 1 ) B(λ 2 ) θ =.... B(λ r ) This representation, called the Jordan normal form of θ, is unique up to reordering of the blocks. The Jordan normal form is named after the French mathematician Camille Jordan (1838 1922). The proof of this theorem will occupy the next two sections. Since we have E(λ i ) E(λ i ), we see that θ is diagonalizable if and only if equality holds. Also recall from linear algebra that two different matrix representations A and B of the same linear map are similar, meaning that S 1 AS = B for some invertible matrix S. In other words, the theorem shows that any square matrix over an algebraically closed field is similar to some matrix on Jordan normal form. Looking at the definition of E(λ), we see it is not immediately obvious how to calculate it; it is equal to the union of the kernels Ker((θ λid) k ) for all k, but do we have to calculate infinitely many powers of θ λid and their kernels? Fortunately, one is enough: Corollary J.7. Let N be the algebraic multiplicity of λ as an eigenvalue of θ. Then E(λ) = {v V (θ λid) N v = } = Ker((θ λid) N ). Proof. Exercise J.8(iii). The next example provides the general algorithm for computing the Jordan normal form. Example J.8. Let us consider the complex matrix 1 1 1 θ =. 1 1 1 Its characteristic polynomial is given by χ(x) = x 4 2x 3 + x 2 = x 2 (x 1) 2, hence the eigenvalues are λ = and λ = 1, both with algebraic multiplicity 2. According to Corollary J.7, the generalized eigenspaces are therefore given by E() = Ker(θ 2 ) and E(1) = Ker((θ id) 2 ). As the reader can verify using their existing knowledge of linear algebra, we therefore have and E() = C(1,,,) + C(,1,,) E(1) = C(1,,1,) + C(,1,,1). 4

These four vectors are definitely a basis for C 4. The matrix representation of θ in this basis is given by S 1 θs, where 1 1 1 1 S =. 1 1 This yields 1 S 1 θs =, 1 1 1 which has Jordan normal form, consisting of the two Jordan blocks J 2 () and J 2 (1). Proof of existence of the Jordan normal form For any principal ideal domain R (like K[x]!), recall that if d is a greatest common divisor of x,y R, then there exist λ,µ R with d = λx + µy. This statement can be generalized to any finite collection of elements. Given x 1,x 2,...,x n in any commutative ring R (not necessarily a principal ideal domain), an element d R is called a greatest common divisor if it is a common divisor and if any other common divisor divides d. In a unique factorization domain, any finite collection of elements has a greatest common divisor (see Exercise J.4). A finite collection of elements is called coprime if 1 is a greatest common divisor. Lemma J.9. Given elements x 1,x 2,...,x n in a principal ideal domain R with greatest common divisor d, there exist µ 1,µ 2,...,µ n R such that d = µ 1 x 1 + µ 2 x 2 + + µ n x n. Proof. Exercise J.5. Warning J.1. We should warn the reader that we use the convention that the characteristic polynomial of a matrix θ is given by χ(x) = det(xid θ), in contrast to Thomsen (216) where it is given by det(θ xid). The difference is a factor of ( 1) n, where n is the dimension of the vector space. Our convention has the advantage that χ becomes a monic polynomial. Proof that V is the direct sum of the generalized eigenspaces. Let χ(x) = det(xid θ) denote the characteristic polynomial of θ. The Cayley Hamilton theorem (see, for instance, ibid., Sætning 15.11) tells us that χ(θ) =. Because the base field K is algebraically closed, we may factorize χ in the form χ(x) = (x λ 1 ) n 1 (x λ 2 ) n2 (x λ r ) n r, where the λ i are the distinct eigenvalues of θ and all n i 1. We first claim that in fact E(λ i ) = V i, where V i = {v V (θ λ i id) n i v = }, 5

so that in the definition of E(λ i ), the multiplicity n i always works in the place of n (note that we cannot appeal to Corollary J.7, since this result relies on the existence of the Jordan normal form). It is clear that V i E(λ i ), so let us prove the other inclusion. If v E(λ i ), then (θ λ i ) n v = for some n, and we might as well assume that n n i. Now (x λ i ) n i is a greatest common divisor of χ(x) and (x λ i ) n, thus we may find p(x),q(x) K[x] such that (x λ i ) n i = p(x)χ(x) + q(x)(x λ i ) n. Substituting θ for t and using χ(θ) =, we find that (θ λ i id) n i = p(θ)χ(θ) + q(θ)(θ λ i id) n = q(θ)(θ λ i id) n. Applying this to v, we get (θ λ i id) n i v =, so v V i. For i = 1,2,...,n, we now define χ(x) f i (x) = (x λ i ) n i = (x λ 1 ) n 1 (x λ 2 ) n2 (x λ i 1 ) n i 1 (x λ i+1 ) n i+1 (x λ r ) n r and note that f 1,f 2,...f r are coprime polynomials (why?). Hence Lemma J.9 implies that there exist µ 1,µ 2,...,µ r K[x] such that Substituting θ for x, we get 1 = f 1 µ 1 + f 2 µ 2 + f r µ r. id = f 1 (θ)µ 1 (θ) + f 2 (θ)µ 2 (θ) + + f r (θ)µ r (θ). Applying this to any v V, we find that v = f 1 (θ)µ 1 (θ)v + f 2 (θ)µ 2 (θ)v + + f r (θ)µ r (θ)v. In particular, any v V lies in the sum Im(f 1 (θ)) + + Im(f r (θ)) of the images of the f i (θ), hence V = Im(f 1 (θ)) + Im(f 2 (θ)) + + Im(f r (θ)). Also, note that = χ(θ) = (θ λ i id) n i f i (θ), which implies Im(f i (θ)) V i. Thus we also have V = V 1 + V 2 + + V r. To prove that this sum is direct, it suffices by Exercise J.6 to show that V i ( V 1 + + V i 1 + V i+1 + + V r ) = for all i = 1,2,...,r. So let v lie in this intersection. Because v V i, we have (θ λ i id) n i v =, while the fact that v lies in the set in parentheses implies that f i (θ)v =. Now f i (x) and (x λ i ) n i are coprime polynomials, so me may find u(x),v(x) K[x] such that 1 = u(x)f i (x) + v(x)(x λ i ) n i. Substituting θ for x, we have id = u(θ)f i (θ) + v(θ)(θ λ i id) n i. Now applying this to v, both terms become zero, and we have v =. 6

In order to prove the existence of a basis for E(λ i ) in which the matrix has Jordan normal form, we note that θ λ i id is a nilpotent operator on E(λ i ), since, according to the above proof, (θ λ i id) n i = on E(λ i ). Hence to finish the proof, all we need to do is to apply the following proposition to θ λ i id on each subspace E(λ i ). Proposition J.11. Given a nilpotent operator θ : V V on a finite-dimensional K-vector space V, there exists a basis for V in which θ has block diagonal form θ = J n1 () for suitable n 1 n 2 n k 1. J n2 ()... J nk () For the proof, we define a Jordan chain to be a chain of elements of the form v, θ(v), θ 2 (v),..., θ p 1 (v) where θ i (v) for i =,1,2,...,p 1, but θ p (v) =. Certainly, Jordan chains exist in V because θ is nilpotent (those worried about allowing empty Jordan chains may assume V ). Let us note that a Jordan chain is automatically linearly independent: If p 1 i= α iθ i (v) =, suppose that there is some j with α j, and assume that j is minimal with this property, so that α = α 1 = = α j 1 =. Then we have p = θ p 1 j α i θ i (v) = α θ p 1 j (v)+α 1 θ p 1 j+1 (v)+ +α j θ p 1 (v) = α j θ p 1 (v), i= showing that α j =, a contradiction. If U = span{θ p 1 (v),θ p 2 (v),...,θ(v),v} is the span of some Jordan chain, then the chain is a basis for U, and the matrix representation of θ on U becomes 1 1...... 1 which just happens to be J p (). A Jordan chain as above is called maximal if v does not lie in the image θ(v ) of θ; this means that we cannot extend the chain backwards. Note that any v V is contained in some maximal Jordan chain (why?). For the proof, because of the above matrix representation, it suffices to prove that V is the direct sum V = U 1 U k of subspaces U k each spanned by some maximal Jordan chain., 7

Proof of Proposition J.11. We shall argue by induction on dim V, the case dimv = being empty. Because θ is nilpotent, θ(v ) must be a proper subspace of V (why?). By induction, we may write θ(v ) = W 1 W r as a direct sum of subspaces each spanned by some Jordan chain which is maximal in θ(v ). Choose a generator in W i for each such chain and write it as θ(u i ) because it lies in the image θ(v ). Then the chain has the form θ(u i ),θ 2 (u i ),...,θ p i 1 (u i ). We let U i = W i Ku i for i = 1,2,...,r. Now θ p 1(u 1 ),...,θ p r (u r ) form a basis for θ(v ) Kerθ (why?), and we may extend this to a basis for all of Kerθ by adding elements which we denote u r+1,u r+2,...,u k. Each of these forms a maximal Jordan chain consisting of one element. We let U i = Ku i for i = r + 1,...,k. We claim that V = U 1 U k, or, in other words, that u 1,θ(u 1 ),...,θ p 1 1 (u 1 ),..., u k,θ(u k ),...,θ p k 1 (u k ), is a basis for V, where we define p r+1 = p r+2 = = p k = 1. To check that V = U 1 + + U k, let v V be arbitrary. Then θ(v) θ(v ) = W 1 W r, and because of how the u i were chosen, we can find a u U 1 + + U r such that θ(u) = θ(v). Thus θ(v u) =, so that v u Kerθ U 1 + + U k. Therefore, v = u + (v u) lies in U 1 + + U k as well. Now to check that the above vectors constitute a basis, it is enough to count that their numbers are equal to the dimension of V. From linear algebra, dimv = dimθ(v ) + dimkerθ. But θ(v ) had a basis consisting of θ(u i ),θ 2 (u i ),...,θ p i 1 (u i ) for i = 1,2,...,r, so dimθ(v ) = (p 1 1) + (p 2 1) + + (p r 1). By choice, θ p 1 1 (u 1 ),...,θ p r 1 (u r ) together with u r+1,...,u k are a basis for Kerθ, hence dimkerθ = k. Thus dimv = (p 1 1) + (p 2 1) + + (p r 1) + k = p 1 + p 2 + + p r + (k r) = p 1 + p 2 + + p r + p r+1 + + p k. This is exactly the number of elements in the claimed basis, which proves that V = U 1 U k. Reorganizing the U i according to dimension n i = dimu i, we can get the desired form of θ where n 1 n k 1. Remark J.12. Notice that the assumption that K was algebraically closed was only used to argue that the characteristic polynomial of θ could be factorized as the product χ(x) = (x λ 1 ) n 1 (x λ 2 ) n2 (x λ r ) n r of linear polynomials. So in other words, the conclusion of the theorem holds for operators over any field K as long as the characteristic polynomial can be factorized this way. In particular, this is the case for K = R if χ has no non-real roots. Note that the uniqueness proof of the next section does not rely on the assumption of algebraic closure, either. 8

Proof of uniqueness of the Jordan normal form To prove the uniqueness statement of Theorem J.6, note first that if θ is represented by a matrix of this form, then the size of the block B(λ i ) measures the dimension of E(λ i ). Thus the size of each such block is uniquely determined. It thus suffices to prove that the internal structure of the block B(λ i ) is uniquely determined, or, in other words, that the matrix representation of θ on E(λ i ) is uniquely determined for each i. Therefore, in proving uniqueness, we may as well assume that V = E(λ) for some λ, so that θ λid is nilpotent on all of V. Also, replacing θ by θ λid, we may as well assume that λ =, so that θ itself is a nilpotent operator on V. With these assumptions, suppose that θ = J n1 () J n2 ()... J nk () where n 1 n 2 n k 1. All we need is to prove that the integers n i are uniquely determined. We claim that dimkerθ k dimkerθ k 1 = #{i k n i }. (1) for all k. We leave it to the reader to verify that this uniquely determines the sequence n 1 n 2 n k 1. To prove the claim, note that and that each block J ni () k is given by J n1 () k θ k J n2 () k =.,.. J nk () k 1.... J ni () k =... 1., which is the n i n i matrix which is everywhere zero except for a diagonal line of ones from the (1,k)th to the (k,1)th cell. In particular, J ni () k = if and only if k n i. For each such matrix J ni () k, the kernel has a basis consisting of the basis vectors corresponding to the first k columns. So the difference in dimension between Ker(J ni ()) k and Ker(J ni ()) k 1 is 1 if k n i and otherwise. Adding the dimensions of the kernels of each block, we arrive at the claim (1). 9

References Berg, C. (213). Complex Analysis. Matematisk afdeling, Københavns Universitet. Hatcher, A. (22). Algebraic Topology. Cambridge University Press. url: www. math.cornell.edu/~hatcher/at/at.pdf. Stetkær, H. (212). Følger og rækker af funktioner. Lecture notes for the course Mathematical analysis 2. Thomsen, J. F. (216). Lineær algebra. Lecture notes for Linear Algebra at Aarhus University. 1

Exercises J.1. Prove that for any λ K, E(λ) is a vector subspace of E(λ). J.2. Prove that θ(e(λ)) E(λ). We usually formulate this by saying that E(λ) is invariant under θ. J.3. Prove that no finite field F q is algebraically closed. (Hint: Recall that x q = x for all x F q.) J.4. Prove that in a unique factorization domain, a greatest common divisor exists between any finite number of elements. J.5. Prove Lemma J.9. J.6. Prove that a vector space V is the direct sum V = V 1 V n of subspaces if and only if V = V 1 + + V n and for all i = 1,2,...,n. V i ( V 1 + + V i 1 + V i+1 + V n ) = J.7. Calculate the Jordan normal form of the following matrices: (a) 4 2 1 3, 5 4 (b) 5 4 3 1 3, 1 2 1 (c) 9 7 3 9 7 4. 4 4 4 J.8. If θ is given on Jordan normal form as in Theorem J.6, verify the following: (i) The algebraic multiplicity of λ i is the size of the block B(λ i ). (ii) The geometric multiplicity of λ i is the number of Jordan blocks in B(λ i ). (iii) The generalized eigenspace is given by E(λ i ) = {v V (θ λ i id) n i = } = Ker((θ λ i id) n i ), where n i is the algebraic multiplicity of λ i ; in other words, in the definition of E(λ i ), n = n i always suffices. J.9. Generalize the results of Example J.4 by finding E(λ) for any λ R. Also try replacing θ = d/dx by θ = d 2 /dx 2. J.1. The Cayley Hamilton theorem states that a linear operator θ : V V on a finite-dimensional vector space is annihilated by its characteristic polynomial χ, meaning that χ(θ) =. However, there may very well be nonzero polynomials of smaller degree with the same property. (i) Verify that I = {f K[x] f (θ) = } is an ideal of K[x]. Deduce that I has a unique monic generator µ, called the minimal polynomial of θ. Note that µ divides χ. (ii) What are the roots of µ? 11

(iii) Can you deduce µ from the Jordan normal form of θ? (iv) Under what circumstances do we have χ = µ? (v) What is the minimal polynomial of a diagonalizable operator? J.11. Prove that the determinant and trace of a linear map on a finite-dimensional vector space are well-defined in the sense that they do not depend on the choice of matrix representation. (Recall that the trace tr(a) of a square matrix is the sum of its diagonal entries and that tr(ab) = tr(ba) for all A,B.) J.12. Let θ : V V be an operator on a finite-dimensional vector space over the algebraically closed field K. Let λ 1,λ 2,...,λ r be the distinct eigenvalues of θ with algebraic multiplicities n 1,n 2,...,n r, respectively. Prove that the determinant and trace of θ are given by (cf. Exercise J.11) det(θ) = λ n 1 1 λn 2 2 λn r r and tr(θ) = n 1 λ 1 + n 2 λ 2 + + n r λ r. In other words, the determinant and trace are the product and sum, respectively, of the eigenvalues, counted with multiplicity. J.13. Let θ : V V be a linear operator on a finite-dimensional vector space over the algebraically closed field K. (i) Prove that there exists a unique decomposition θ = D + P of θ as a sum of a diagonal matrix D and a nilpotent matrix N. Tis is called the additive Jordan decomposition. (ii) Prove that if θ is nonsingular, it can be written uniquely as a product θ = DU of a diagonalizable matrix D and a unipotent matrix U, meaning that U is the sum of the identity and a nilpotent matrix. This is called the multiplicative Jordan decomposition. J.14. (i) Prove that over an algebraically closed field of characteristic p >, some positive power of any matrix is diagonalizable. (ii) Prove the same statement over finite fields. J.15. The exponential of a linear operator θ : V V on a finite-dimensional real or complex vector space is defined by exp(θ) = n= 1 n! θn. It is not immediately obvious that the sum always converges, and in this exercise, we give a proof using the Jordan normal form. Because of this, we assume that the base field is C. (i) Argue that the sum converges for diagonalizable and nilpotent matrices. (ii) Prove that if exp(a) and exp(b) converge for two matrices A and B satisfying AB = BA, then exp(a + B) converges to exp(a)exp(b). 12

(iii) Combine the two above statements with the existence of the Jordan normal form to prove that exp(θ) exists for all operators θ, and write down an explicit expression for it. J.16. In this note, we proved the existence of the Jordan normal form as a corollary to Cayley Hamilton, but it is also possible to give independent proofs. Show how one can then derive Cayley Hamilton as a corollary to the existence of the Jordan normal form. (The proof becomes universal once we know that any field can be embedded inside an algebraically closed field. The smallest algebraically closed field containing a given field K is called the algebraic closure of K and is written K. For instance, C is the algebraic closure of R. Existence and uniqueness (up to isomorphism) of the algebraic closure can be proved using the Axiom of Choice.) 13