Notes on the matrix exponential

Notes on the matrix exponential Erik Wahlén erik.wahlen@math.lu.se February 14, 212 1 Introduction The purpose of these notes is to describe how one can compute the matrix exponential e A when A is not diagonalisable. This is done in Teschl by transforming A into Jordan normal form. As we will see here, it is not necessary to go this far. It suces to transform A into block form, where each block only has one eigenvalue (up to multiplicity). For completeness we also present a proof of the Jordan normal form at the end. The material in these notes is roughly the same as in Chapter 3.8 of Teschl, but the presentation and the proofs are a bit dierent. We also give more examples. We begin with a brief review of linear algebra. Recall that a complex n n matrix A can be identied with the linear operator x Ax on C n. A is simply the matrix for this linear operator in the standard basis {e 1,..., e n }, where e 1 = (1,,,..., ), e 2 = (, 1,,..., ), etc. If we choose a new basis {f 1,..., f n }, then the matrix for the operator in the new basis is B = T 1 AT, where T is the matrix whose columns consist of the coordinates for the vectors f j in the old basis (the standard basis). Recall also that the matrix B can be found directly by calculating Af j for each j and writing this vector in the basis {f 1,..., f n }. The coordinates will be the j:th column of B. In general we want to pick a basis such that B becomes as simple as possible. It is often more convenient to consider an abstract complex n-dimensional vector space V and a linear operator A on V. We assume that a basis for V has been selected and identify A with the matrix in this basis. Recall that the kernel (or null space) and range of A are dened by ker A = {x V : Ax = } and range A = {Ax: x V }. The kernel and the range are both linear subspaces of V and the dimension theorem says that dim ker A + dim range A = n. Recall also that λ C is called an eigenvalue of A if there exists some vector x such that Ax = λx. 1

The vector x is called an eigenvector of A corresponding to the eigenvalue λ. The subspace ker(a λi), that is, the subspace spanned by the eigenvectors belonging to λ, is called the eigenspace corresponding to λ. The number dim ker(a λi) is called the geometric multiplicity of λ. Note that λ C is an eigenvalue if and only if it is a root of the characteristic polynomial p char (z) = det(a zi). By the fundamental theorem of algebra we can write p char (z) as a product of rst degree polynomials, p char (z) = ( 1) n (z λ 1 ) a 1 (z λ 2 ) a2 (z λ k ) a k, where λ 1,..., λ k are the distinct eigenvalues of A. The positive integer a j is called the algebraic multiplicity of the eigenvalue λ j. The corresponding geometric multiplicity will be denoted g j. If it is possible to nd a basis for V consisting of eigenvectors, then the matrix T AT 1 will be particularly simple. Indeed, Af j = λ j f j, so the new matrix will have the diagonal form λ 1 λ 2 D =....... λ n One says that A has been diagonalised. Unfortunately, not all matrices can be diagonalised. Example 1.1. Consider the matrix A = ( ) 1. The characteristic polynomial is λ 2 so the only eigenvalue is λ =. On the other hand ( ) ( ) ( ) 1 x1 x2 Ax = = so that the only eigenvectors are (z, ) where z C. Clearly, we cannot nd a basis consisting of eigenvectors. There are however some important special cases in which one can diagonalise the matrix. For the proofs of the following results we refer to any textbook in linear algebra. Proposition 1.1. Suppose that A has n distinct eigenvalues. Then there exists a basis of eigenvectors for A. Proposition 1.2. Suppose that A is Hermitian, that is, A = A, where A = A t (the bar denoting complex conjugation). Then there exists an orthonormal basis of eigenvectors for A. Remark. The last property also holds when A is normal, meaning that A A = AA. Hermitian matrices are obviously normal. So are unitary matrices since they satisfy A A = I = AA. 2 x 2

2 Decomposition into invariant subspaces We will now consider what to do when A can't be diagonalised. Let V be a complex vector space and let V 1,..., V k be subspaces. We say that V is the direct sum of V 1,..., V k if each vector x V can be written in a unique way as x = x 1 + x 2 + + x k, where x j V j, j = 1,..., k. If this is the case we use the notation V = V 1 V 2 V k. We say that a subspace W of V is invariant under A if x W Ax W. Example 2.1. Suppose that A has n distinct eigenvalues λ 1,..., λ n with corresponding eigenvectors u 1,..., u n. It then follows that the vectors u 1,..., u n are linearly independent and thus form a basis for V. Let be the corresponding eigenspaces. Then ker(a λ j I) = {zu j : z C}, j = 1,..., n, V = ker(a λ 1 I) ker(a λ 2 I) ker(a λ n I) by the denition of a basis. It is also clear that each eigenspace is invariant under A. More generally, suppose that A has k distinct eigenvalues λ 1,..., λ k and that the geometric multiplicity g j of each λ j equals the algebraic multiplicity a j. Let ker(a λ j I), j = 1,..., k, be the corresponding eigenspaces. We can then nd a basis for each eigenspace consisting of g j eigenvectors. The union of these bases consists of g 1 + +g k = a 1 + + a k = n elements and is linearly independent, since eigenvectors belonging to dierent eigenvalues are linearly independent. We thus obtain a basis for V and it follows that V = ker(a λ 1 I) ker(a λ 2 I) ker(a λ k I). In this basis, A has the matrix D = λ 1 I 1... λ k I k where each I j is a g j g j unit matrix. In other words, D is a diagonal matrix with the eigenvalues on the diagonal, each repeated g j times. Given a polynomial p(z) = α m z m + α m 1 z m 1 + + α 1 z + α, we dene p(a) = α m A m + α m 1 A m 1 + + α 1 A + α I. 3

Lemma 2.1. There exists a non-zero polynomial p such that p(a) =. Proof. Note that C n n is an n 2 -dimensional vector space. It follows that the n 2 + 1 matrices I, A, A 2,..., A n2 are linearly dependent. But this means that there exist numbers α,..., α n 2, not all zero, such that α n 2A n2 + α n 2 1A n2 1 + + α 1 A + α I =, that is, p(a) =, where p(z) = α n 2z n2 + + α 1 z + α. Let p min (z) be a monic polynomial (with leading coecient 1) of minimal degree such that p min (A) =. If p(z) is any polynomial such that p(a) = it follows that p(z) = q(z)p min (z) for some polynomial q. To see this, use the division algorithm on p and p min : p(z) = q(z)p min (z) + r(z), where r = or deg r < deg p min. Thus r(a) = p(a) q(a)p min (A) =. But this implies that r(z) =, since p min has minimal degree. This shows that the polynomial p min is unique. It is called the minimal polynomial for A. By the fundamental theorem of algebra, we can write the minimal polynomial as a product of rst degree polynomials, p min (z) = (z λ 1 ) m 1 (z λ 2 ) m2 (z λ k ) m k, (2.1) where the numbers λ j are distinct and each m j 1. Note that we don't know that the roots λ j of the minimal polynomial coincide with the eigenvalues of A yet. This will be shown in Theorem 2.1 below. Lemma 2.2. Suppose that p(z) = p 1 (z)p 2 (z) where p 1 and p 2 are relatively prime. If p(a) = we have that V = ker p 1 (A) ker p 2 (A) and each subspace ker p j (A) is invariant under A. Proof. The invariance follows from p j (A)Ax = Ap j (A)x =, x ker p j (A). Since p 1 and p 2 are relatively prime, it follow by Euclid's algorithm that there exist polynomials q 1, q 2 such that p 1 (z)q 1 (z) + p 2 (z)q 2 (z) = 1. Thus p 1 (A)q 1 (A) + p 2 (A)q 2 (A) = I. Applying this identity to the vector x V, we obtain x = p 1 (A)q 1 (A)x + p 2 (A)q 2 (A)x, }{{}}{{} x 2 x 1 4

where p 2 (A)x 2 = p 2 (A)p 1 (A)q 1 (A)x = p(a)q 1 (A)x =, so that x 2 ker p 2 (A). Similarly x 1 ker p 1 (A). Thus V = ker p 1 (A) + ker p 2 (A). On the other hand, if x 1 + x 2 = x 1 + x 2, x j, x j ker p j (A), j = 1, 2, we obtain that so that y = x 1 x 1 = x 2 x 2 ker p 1 (A) ker p 2 (A), y = p 1 (A)q 1 (A)y + p 2 (A)q 2 (A)y = q 1 (A)p 1 (A)y + q 2 (A)p 2 (A)y =. It follows that the representation x = x 1 + x 2 is unique and therefore V = ker p 1 (A) ker p 2 (A). Theorem 2.1. With λ 1,..., λ k and m 1,..., m k as in (2.1) we have V = ker(a λ 1 I) m 1 ker(a λ k I) m k, is invariant under A. The numbers λ 1,..., λ k are the eigen- where each ker(a λ j I) m j values of A. Proof. We begin by noting that the polynomials (z λ j ) m j, j = 1,..., k, are relatively prime. Repeated application of Lemma 2.2 therefore shows that V = ker(a λ 1 I) m 1 ker(a λ k I) m k, with each ker(a λ j I) m j invariant. Consider the linear operator A: ker(a λ j I) m j ker(a λ j I) m j. It is clear that ker(a λ j I) m j {}, for otherwise p min would not be minimal. Since every linear operator on a (non-trivial) nite dimensional complex vector space has an eigenvalue, it follows that there is some non-zero element u ker(a λ j I) m j with Au = λu, λ C. But then = (A λ j I) m j u = (λ λ j ) m j u, so λ = λ j. This shows that the roots λ j of the minimal polynomial are eigenvalues of A. On the other hand if u is an eigenvector of A corresponding to the eigenvalue λ, we have = p min (A)u = (A λ 1 I) m1 (A λ k I) m k u = (λ λ 1 ) m1 (λ λ k ) m k u, so λ = λ j for some j, that is, every eigenvalue is a root of the minimal polynomial. 5

The subspace ker(a λ j I) m j is called the generalised eigenspace corresponding to λ j and a non-zero vector x ker(a λ j I) m j is called a generalised eigenvector. The number m j is the smallest exponent m such that (A λ j I) m vanishes on ker(a λ j I) m j. Suppose for a contradiction that e.g. (A λ 1 I) m1 1 u = for all u ker(a λ 1 I) m 1. Writing x V as x = x 1 + x according to the decomposition V = ker(a λ 1 I) m 1 ker p(a), where p(z) = (z λ 2 ) m2 (z λ k ) m k, we would then obtain that (A λ 1 I) m 1 1 p(a)x = p(a)(a λ 1 I) m 1 1 x 1 + (A λ 1 I) m 1 1 p(a) x =, contradicting the denition of the minimal polynomial. If we select a basis {u j,1,..., u j,nj } for each generalised eigenspace, then the union {u 1,1,..., u 1,n1, u 2,1,..., u 2,n2,..., u k,1,..., u k,nk } will be a basis for V. Since each generalised eigenspace is invariant under the linear operator A, the matrix for A in this basis will have the block form B = B 1... where each A j is a n j n j square matrix. Moreover, A j only has one eigenvalue λ j. Set N j = A λ j I j, where I j is an n j n j unit matrix. Then N m j j = by the denition of the generalised eigenspaces. A linear operator N with the property that N m = for some m is called nilpotent. Finally, we note that the dimension of the generalised eigenspace ker(a λ j I) m j equals the algebraic multiplicity of the eigenvalue λ j, that is, n j = a j. This follows since B k, ( 1) n (λ λ 1 ) a1 (λ λ k ) a k = det(b λi) In summary we have proved the following result. = det(b 1 λi 1 ) det(b k λi k ) = ( 1) n (λ λ 1 ) n1 (λ λ k ) n k. Theorem 2.2. Let A be an n n matrix. There exists a basis for C n in which A has the block form B 1... and B j = λ j I j + N j, where λ 1,..., λ k are the distinct eigenvalues of A, I j is an a j a j unit matrix and N j is nilpotent. Before using this to compute the matrix exponential, we mention a very beautiful result whose proof follows easily from the previous theorem. B k, 6

Theorem 2.3 (Cayley-Hamilton). Let p char (z) = det(a zi) be the characteristic polynomial of A. Then p min p char, so that in particular p char (A) =. Proof. The exponent m j of the factor (z λ j ) m j in the minimal polynomial is the smallest exponent m such that N m j =. Let x be such that N m j j x = and N m j 1 j x. We then claim that the vectors x, N j x,..., N m j 1 j x are linearly independent. Indeed, multiplying the equation α x + α 1 N j x + + α mj 1N m j 1 j x = by N m j 1 j we obtain α N m j 1 j x = α N m j 1 j x + α 1 N m j j x + + α mj N 2m j 2 j x =, so that α =. Proceeding inductively, we obtain that α = α 1 = = α mj 1 =, proving the claim. It follows that m j a j and hence that the minimal polynomial divides the characteristic polynomial, since p min (z) = (z λ 1 ) m1 (z λ k ) m k and p char (z) = ( 1) n (z λ 1 ) a1 (z λ k ) a k. The nal statement follows from the fact that p min (A) =. 3 The matrix exponential Recall that the unique solution of the initial value problem { x = Ax, is given by x() = x, x(t) = e ta x. If B is the block form of A and A = T BT 1, we obtain that where and e tb = e ta = T e tb T 1, (3.1) e tb 1... e tb k, ) e tb j = e t(λ ji j +N j ) = e tλ ji j e tn j = e λ jt (I + tn j + + tm j 1 (m j 1)! N m j 1 j, 7

since N m j = for m m j. We now clearly see the advantage of this block form. In general, the solution of the initial-value problem will be a sum of terms of the form t m e λ jt. If A has a basis of eigenvectors, there will only be terms of the form e λ jt. The following example shows what happens for 2 2 matrices which are not diagonalisable. The same method can be applied to any such matrix. Example 3.1. Let A = ( ) 3 4. 1 1 The characteristic polynomial is (z + 1) 2, so 1 is the only eigenvalue. We have ( ) 2 4 N = A + I =, 1 2 from which it follows that the only eigenvectors are zu, with u = (2, 1) and z C. Since (A + I) 2 =, it follows that the generalised eigenspace is C 2 (this can also be realised directly). We nd that (( ) ( )) 1 2 4 e ta = e t( I+N) = e t e tn = e t (I + tn) = e t + t 1 1 2 ( ) (1 2t)e t 4te = t te t (1 + 2t)e t. For 3 3 matrices there are more possibilities. Example 3.2. Let 1 1 A = 2. 1 1 The characteristic polynomial of A is p char (z) = z 2 (z 2). Thus, A has the only eigenvalues λ 1 = and λ 2 = 2 with algebraic multiplicities a 1 = 2 and a 2 = 1, respectively. We nd that Ax = x = z(1,, 1), Ax = 2x x = z(, 1, ), z C. Thus u 1 = (1,, 1) and u 2 = (, 1, ) are eigenvectors corresponding to λ 1 and λ 2, respectively. The generalised eigenspace corresponding to λ 2 is simply the usual eigenspace ker(a 2I), but the one corresponding to λ 1 must be ker A m 1, with m 1 2. By the Cayley-Hamilton theorem, we must also have m 1 2, so m 1 = 2. Calculating A 2 = 4, 8

we nd e.g. the basis {u 1, u 1}, where u 1 = (1,, ), for ker A 2. Since Au 1 = u 1, the new matrix is 1 B = 2 and A = T BT 1 with 1 1 1 T = 1 and T 1 = 1 1. 1 1 We have with and Hence, and Example 3.3. Let e tb 1 = I + tn 1 = ( e e tb tb 1 = e tb 2 ( ) 1 + t 1 e tb 2 = e 2t. ), 1 t e tb = 1. e 2t ( ) 1 = 1 + t t e ta = T e tb T 1 = e 2t. t 1 t 3 1 1 A = 2. 1 1 1 ( ) 1 t 1 The characteristic polynomial of A is p char (z) = (z 2) 3. Thus, A has the only eigenvalue 2 with algebraic multiplicity 3. The generalised eigenspace is the whole of C 3. On the other hand, 1 1 1 A 2I =, (A 2I) 2 =, 1 1 1 so that p min (z) = (z 2) 2 and the generalised eigenspace is ker(a 2I) 2 = C 3. A is already in block form and 1 1 1 1 (1 + t)e 2t te 2t te 2t e ta = e 2t (I+tN) = e 2t 1 + t = e 2t. 1 1 1 1 te 2t te 2t (1 t)e 2t 9

Example 3.4. Let 2 A = 2 1. 1 2 Again p char (z) = (z 2) 3 and thus 2 is the only eigenvalue. The generalised eigenspace is the whole of C 3. This time A 2I = 1, (A 2I) 2 = 1, (A 2I) 3 1 so that p min (z) = (z 2) 3 and the generalised eigenspace is ker(a 2I) 3 = C 3. Again, A is already in block form, but this time m = 3 so that e ta = e 2t (I + tn + t2 N 2 1 2 ) = e2t 1 + t 1 + t2 1 2 1 1 e 2t = t2 2 e2t e 2t te 2t. te 2t e 2t The 4 4 case can be analysed in a similar way. In general, the computations will get more involved the higher n is. Most computer algebra systems have routines for computing the matrix exponential. In Maple this can be done using the command MatrixExponential from the LinearAlgebra package. 4 The Jordan normal form* Theorem 4.1. Let A be an n n matrix. There exists an invertible n n matrix T such that T 1 AT = J, where J is a block matrix, J = J 1... and each block J j is a square matrix of the form λ 1...... J j = λi + N =... 1, (4.1) λ where λ is an eigenvalue of A, I is a unit matrix and N has ones on the line directly above the diagonal and zeros everywhere else. J m 1

Remark. There is also an alternative version for real matrices see Teschl. By rst using Theorem 2.2, we can choose a basis in which A has the block form B 1... The theorem is proved by picking a basis for each generalised subspace ker(a λ j I) m j so that B j takes the form J j. By considering each generalised eigenspace separately, we can assume from the start that A only has one eigenvalue, which we call λ. Moreover, A = λi + N, where N is nilpotent. We let m be the smallest positive integer such that N m =. Suppose that m = n. This means that there is some vector u such that N n 1 u. By looking at the proof of Theorem 2.3 it follows that the vectors u, Nu,..., N n 1 u are linearly independent. {N n 1 u,..., Nu, u} is therefore a basis for V. The matrix for N in this basis is 1......... 1, which means that we are done. In general, a set of non-zero vectors u,..., N l 1 u, with N l u = is called a Jordan chain. We will prove the theorem in general by showing that there is a basis for V consisting of Jordan chains. Proof. We prove the theorem by induction on the dimension of V. Clearly the theorem holds if V has dimension 1. Suppose now that the theorem holds for all complex vector spaces of dimension less than n, where n 2, and assume that dim V = n. Since N is nilpotent it is not injective and therefore dim range N < n (by the dimension theorem). By the induction hypothesis, we can therefore nd a basis of Jordan chains B k. u i, Nu i,..., N l i 1 u i, i = 1,..., k, for range N. For each u i we can nd a v i V such that Nv i = u i (since u i range N). That is, each Jordan chain in the basis for range N can be extended by one element. We claim that the vectors v i, Nv i, N 2 v i,..., N l i v i, i = 1,..., k, (4.2) are linearly independent. Indeed, suppose that k i=1 l i j= α i,j N j v i =. (4.3) 11

Applying N to this equality, we nd that k l i 1 k α i,j N j u i = i=1 j= i=1 l i j= α i,j N j+1 v i =, which, by hypothesis implies that α i,j =, 1 i k, j l i 1. Looking at (4.3) this means that k k α i,li N li 1 u i = α i,li N l i v i =, i=1 which again implies that α i,li =, 1 i k, by our induction hypothesis. Extend the vectors in (4.2) to a basis for V by possibly adding vectors { w 1,..., w K }. For each i we have N w i range N, so we can nd an element ŵ i in the span of the vectors in (4.2) such that N w i = Nŵ i. But then w i = w i ŵ i ker N and the vectors v i, Nv i, N 2 v i,..., N l i v i, i = 1,..., k, w 1,..., w K constitute a basis for V consisting of Jordan chains (the elements w i are chains of length 1). The matrix J is not completely unique, since we e.g. can change the order of the Jordan blocks. It turns out that this is the only thing which is not unique. In other words both the number of blocks and their sizes are uniquely determined. Let us prove this. It suces to consider a nilpotent operator N : V V. Let β be the total number of blocks and β(k) the number of blocks of size k k. Then dim ker N = β, and dim ker N 2 diers from dim ker N by β β(1). In the same manner, we nd that dim ker N = β, i=1 dim ker N 2 = dim ker N + β β(1),. dim ker N k+1 = dim ker N k + β β(1) β(k). It follows by induction that each β(k) is uniquely determined by N. Note that the number of Jordan blocks in the matrix J equals the number of Jordan chains, so that there may be several Jordan blocks corresponding to the same eigenvalue. The sum of the lengths of the Jordan chains equals the dimension of the generalised eigenspace. Let p char (z) = det(a zi) be the characteristic polynomial of A. Recall that p char is independent of basis, so that p char (z) = det(j zi). Expanding repeatedly along the rst column we nd that p char (z) = ( 1) n (z λ 1 ) n1 (z λ k ) n k, where nj = dim(a λ j I) m j is the dimension of the generalised eigenspace corresponding to λ j. Thus n j = a j, the algebraic multiplicity of λ j. By the remarks above about the uniqueness of J, it follows that the geometric multiplicity g j of each eigenvalue equals the number of Jordan chains for that eigenvalue. 12

Example 4.1. Consider the matrix A = ( 3 ) 4 1 1 from Example 3.1. We showed that it has the only eigenvalue 1 with corresponding eigenspace {zu: z C}, u = (2, 1). To nd the Jordan normal form we need to nd the Jordan chain corresponding to u. Since ( ) 2 4 A + I =, 1 2 the equation (A+I)u = u has the general solution u = ( 1, )+zu. We can therefore take {(2, 1), ( 1, )} as our Jordan basis. The Jordan normal form is ( ) 1 1 J = 1 The 3 3 matrix in Example 3.2 can be analysed in a similar way. The matrix in Example 3.3 requires more work. Example 4.2. The matrix 3 1 1 A = 2. 1 1 1 in Example 3.3 has the only eigenvalue 2 with algebraic multiplicity 3. The geometric multiplicity is 2 since Ax = 2x x 1 + x 2 x 3 =. The eigenspace is e.g. spanned by the vectors (1, 1, ) and (1,, 1). It is immediately clear that a Jordan basis will consist of one Jordan chain of length 2 and one of length 1 and that the Jordan normal form is 2 1 J = 2. 2 However, we can't just take an arbitrary eigenvector u and solve the equation (A 2I)u = u since it's not certain that u range(a 2I). We therefore begin by computing 1 1 1 A 2I =, 1 1 1 Notice that range(a 2I) is spanned by the vector u 1 = (1,, 1). This is also an eigenvector (note that (A 2I) 2 = ). Next, we nd a solution of the equation (A 2I)u 1 = u 1, e.g. u 1 = (1,, ). Finally, we add an eigenvector which is not parallel to u 1, e.g. u 2 = (, 1, 1). Then {u 1, u 1, u 2 } is a Jordan basis. The matrix in Example 3.4 can be dealt with in a similar way by rst considering range(a 2I) 2. 13

Exercises 1. Compute e A by summing the power series when ( ) 1 2 1 a) A = b) A = 2. 1 ( ) 1 2. Compute e ta by diagonalising the matrix, where A =. 1 3. Solve the initial-value problem 2 1 3 x (t) = 3 2 3 x(t), x() = 9. 3 1 2 6 4. Show that 5. Show that e A e A. (e A ) = e A. 6. Show that e S is unitary if S is skew symmetric, that is, S = S. 7. Show that the following identities (for all t R) imply AB = BA. a) Ae tb = e tb A, b) e ta e tb = e t(a+b). 8. Let 1 1 1 1 2 1 2 A 1 = 1 1, A 2 = 1 1, A 3 = 3 1 3. 1 1 1 1 2 2 1 Calculate the generalised eigenspaces of each A j and nd a matrix T j such that T 1 j A j T j is in block form. What is the minimal polynomial of A j? 9. Calculate e ta j for the matrices A j in the previous exercise. 1. The matrix 18 3 2 12 A = 2 2 12 2 1 24 6 3 16 has the (algebraically) double eigenvalues 1 and 2. B = T 1 AT is in block form. What is B? Find a matrix T such that 14

11. Consider the initial value problem { x 1 = x 1 + 3x 2, x 2 = 3x 1 + x 2, x() = x. For which initial data x does the solution converge to zero as t? 12. Can you nd a general condition on the eigenvalues of A which guarantees that all solutions of the IVP x = Ax, x() = x. converge to zero as t? 13. The matrices A 1 and A 2 in Exercise 8 have the same eigenvalues. If you've solved Exercise 9 correctly, you will notice that all solutions of the IVP corresponding to A 1 are bounded for t while there are unbounded solutions of the IVP corresponding to A 2. Explain the dierence and try to formulate a general principle. 15