Diagonalizing Matrices Massoud Malek A A Let A = A k be an n n non-singular matrix and let B = A = [B, B,, B k,, B n ] Then A n A B = A A 0 0 A k [B, B,, B k,, B n ] = 0 0 = I n 0 A n Notice that A i B i = and for i j, A i B j = 0 Thus if the columns of the matrix A form an orthonormal basis, then A = B = A An n n matrix U is called unitary if U = U ; ie, U U = U U = I n If U is a real matrix (in which case U is just U t ), then U is called an orthogonal matrix Matrices of rotations are orthogonal matrices For instance, in 3D-space, cos θ sin θ 0 Rotation along the z-axis: R = sin θ cos θ 0 det(r) = 0 0 Two n n matrices A and B are called unitary equivalent if there exists a unitary n n matrix U such that B = U A U Note that trace(b B) = trace [ (U A U ) (U A U ] = trace (U A U U A U ) = trace (U A A U ) = trace (U U A A) = trace(a A) It is often more simpler to work in an orthogonal basis Gram-Schmidt Orthonormalization Process It is a method for obtaining orthonormal vectors from a linearly independent set of vectors in an inner product space, most commonly the Euclidean space R n If S = {v, v, v 3,, v k } is a linearly independent set of vectors generating a subspace of R n Then by using the Gram-Schmidt orthogonalization process on S, we can generate an orthogonal set S G = {u, u, u 3,, u k } of vectors that spans the same subspace of R n as the set S By normalizing the vectors u i s, we obtain the orthonormal basis S N = {e, e, e 3,, e k } spaning
Diagonalizing Matrices Linear Algebra the same subspace of R n v, u We define the projection operator by proj u (v) = u, where v, u denotes the inner product u, u of the vectors v and u This operator projects the vector v orthogonally onto the line spanned by u The Gram-Schmidt process then works as follows: u = v, e = u u u = v proj u (v ), e = u u u 3 = v 3 proj u (v 3 ) proj u (v 3 ), e 3 = u 3 u 3 u 4 = v 4 proj u (v 4 ) proj u (v 4 ) proj u3 (v 4 ), e 4 = u 4 u 4 = j= k u k = v k proj uj (v k ), e k = u k u k The sequence {u, u,, u k } is the required system of orthogonal vectors, and the normalized vectors {e, e,, e k } form an orthonormal set The calculation of the sequence u, u,, u k is known as Gram-Schmidt orthogonalization, while the calculation of the sequence e, e,, e k is known as Gram-Schmidt orthonormalization as the vectors are normalized Example Consider the basis B = v =, v =, v 3 = Then use Gram-Schmidt to find the orthogonal basis B G = { u, u, u 3 } and the orthonormal basis B N = { e, e, e 3 } associated with the basis B u = v = ; e = u u = 3 u = v v, u u, u u = = 3 3 = = u = ; e = u u = 6 u 3 = v 3 v 3, u u, u u v 3, u u, u u = + = 0 ; e 3 = u 3 3 3 u 3 = 0 Hence B G = u =, u =, u 3 = 0 and B N = e =, e =, e 3 = 0 3 6 A n n matrix A is diagonalizable, if all its eigenvectors are linearly independent If the eigenvalues of A are all different, then it is diagonalizable A real matrix A is said to be normal, if A t A = A A t, for the case of complex matrices, the normality means A A = A A
Diagonalizing Matrices Linear Algebra 3 Examples of real normal matrices are symmetric (A t = A) and skew-symmetric (A t = A) matrices; in the complex case, hermitian (A = A) and skew-hermitian (A = A) matrices are normal Any matrix similar to a normal matrix is also diagonalizable Diagonalizing Theorem If the n n matrix A has n linearly independent eigenvectors, then there is an invertible matrix P such that P A P = D, where D is a diagonal matrix Proof Let v, v,, v n be linearly independent eigenvectors of A, corresponding to eigenvectors, λ, λ,, λ n Let P = [v, v,, v n ], then A P = [λ v, λ v,, λ n e n ] The matrix P A P, then becomes a diagonal matrix with λ, λ,, λ n as its diagonal entries Corollary If the n n matrix A has n linearly independent eigenvectors, then there is a unitary matrix U such that E A E = D, where D is a diagonal matrix Proof Let v, v, v 3,, v n be linearly independent eigenvectors of A corresponding to eigenvectors, λ, λ,, λ n By using Gram-Schmidt orthonormalization process, we may obtain a set of orthonormal vectors {e, e,, e n } Let E = [ e, e,, e n ], then A E = [ λ e, λ e,, λ n e n ] Since E = E, the matrix E A E becomes a diagonal matrix Note Note that all the eigenvalues of a symmetric matrix are real, therefore E must be orthogonal In the case of skew-symmetric, the eigenvalues are all pure imaginary, therefore E must be complex Applications Diagonalizing a matrix can make many calculations much quicker and easier Diagonalization can be used to compute the powers of a matrix A efficiently Suppose we have found that P A P = D P P A P P = P D P A = P D P, where D is a diagonal matrix Then, as the matrix product is associative, A k = ( P D P ) k = (P D P ) ( P D P ) ( P D P ) = P D ( P P ) D ( P P ) ( P P ) D P = P D k P and the latter is easy to calculate since it only involves the powers of a diagonal matrix This approach can be generalized to matrix exponential and other matrix functions since they can be defined as power series If A has an inverse, then A = P D P Consider the quadratic curve x 4 x y + y = 30 It can be changed into the matrix form and then brought into diagonal form: ( ) ( ) ( ) ( ) x x ( x, y ) = 30 = ( x, y ) = 30 y y ( ) The matrix A = is diagonalized with the unitary matrix ( ) ( ) ( U = U / / = / / with ( x, y / / x + y ) = ( x, y ) / / =, x y ) We have ( / / / / ) ( ) ( / / / / ) = ( ) 0 0 6 ( ) ( ) = ( x, y 0 x ) 0 6 y = 30
Diagonalizing Matrices Linear Algebra 4 The similarity transformation brings the quadratic curve into the canonical form of the ellipse Frobenius Covariant (x ) 30 + (y ) The Frobenius covariants of a diagonalizable matrix with k distinct eigenvalues: λ, λ,, λ k, are matrices A i associated with the eigenvalues and eigenvectors of A Each covariant is a projection on the eigenspace associated with λ i Frobenius covariants are the coefficients of Sylvester s formula, that expresses a function of a matrix f (A) as a linear combination of its values on the eigenvalues of A They are named after the mathematician Ferdinand Frobenius To compute the Frobenius covariants ofthe diagonalizable matrix A, first we find the matrices Q = P = [P P P n ] and Q = P Q = such that Q n Q λ 0 0 Q Q A P = A [ P P P n ] = 0 λ Q n 0 0 λ n Note Clearly the matrix P which diagonalizes A is not unique It is important to note that by choosing a different matrix P, we still obtain a diagonal matrix, where the diagonal entries are the eigenvalues of A Also, note that the columns of P are the right eigenvectors of A and the rows of Q = P are the left eigenvectors of A Sylvester s formula Let f (x) be an analytic function and let A be a diagonalizable matrix with k distinct eigenvalues, λ, λ,, λ k in the domain of f (x) and the other eigenvalues with multiplicity larger than one in the inside of the domain of f (x) Sylvester s matrix theorem or Lagrange-Sylvester interpolation expresses f (A) as f (A) = k f(λ i )A i, i= where the matrices A i s are the corresponding Frobenius covariants of A, which are (projection) matrix Lagrange polynomials of A Example Let A = 3 3, then A = P D Q, where 3 0 0 0 P =, D = 0 4 0, and Q = P = 0 0 0 0 0
Diagonalizing Matrices Linear Algebra The Frobenius covariants are: A = ( ) 0 0 = 0, 0 A = ( 0 ) = 0 0, 0 A 3 = ( 0 ) 0 = 0 0 Sylvester s formula then amounts to f(a) = f (λ )A + f (λ )A + f (λ 3 ) A 3 For instance, if f (x) = x 4 x +, then Sylvester s formula expresses f (A) = A 4 A + I 3 = f () A + f ( 4) A + f ( ) A 3 = A + A + 7 A 3 8 = 6 3 8 6 9 The fact that A is invertible, then by choosing f (x) = x, we may find A, as follows: A = λ A + λ A + λ3 A 3 = 4 3 3 4 8 3 0 Sylvester s Determinant Theorem states that if A and B are matrices of size m n and n m respectively, then det (I m + A B) = det (I n + B A) Proof Using the block matrices [ ] Im A 0 I n and [ Im A we obtain: [ ] [ ] [ ] ] ] [ ] Im A Im A Im A Im A Im + A B 0 det det = det = det = det(i 0 I n B I n 0 I n B I n 0 I m + A B) n and [ ] [ ] Im A Im A det det B I n 0 I n B I n ], [ ] [ ] Im AI = det m A Im 0 = det =det(i B I n 0 I n 0 I n + B A n + B A) Since the leftmost members of these two equalities are equal, the equality follows Schur s Triangularization Theorem Given n n matrix A with eigenvalues λ, λ,, λ n, counting multiplicities, there exists an n n unitary matrix U such that λ x x A = U 0 λ x x U 0 0 λ n
Diagonalizing Matrices Linear Algebra 6 Proof The proof is by induction For n =, there is nothing to prove since U can be taken to be the identity matrix Now we assume that Schur s Theorem is true for any (n ) (n ) matrix We shall prove that it is also true for n n matrices Let v be an eigenvector of A corresponding to an eigenvector λ Using the vector v, we create B = {v, v, v 3,, v n }, a basis for the complex vector space C n We use the Gram-Schmidt process, to obtain the orthonormal basis B = {e, e, e 3,, e n } Note that e is also an eigenvector of A corresponding to an eigenvalue λ The matrix Q = [ e, e, e 3,, e n ] is a unitary matrix and we have ( ) Q λ X A Q =, θ B where X is a (n ) matrix, θ is the (n ) zero matrix, and B is an (n ) (n ) matrix By the induction hypothesis, there is an (n ) (n ) unitary matrix U n which changes the matrix B into a triangular matrix C Let U = Q ( U n ), Then the unitary matrix U A U = [ ( Un ) Q ] ( ) ( ) A [ Q ( U n ) ] = ( Un λ X λ X ) ( U θ B n ) = θ C which is a triangular matrix