Unit 5: Matrix diagonalization Juan Luis Melero and Eduardo Eyras October 2018 1
Contents 1 Matrix diagonalization 3 1.1 Definitions............................. 3 1.1.1 Similar matrix....................... 3 1.1.2 Diagonizable matrix................... 3 1.1.3 Eigenvalues and Eigenvectors.............. 3 1.2 Calculation of eigenvalues and eigenvectors........... 4 2 Properties of matrix diagonalization 7 2.1 Similar matrices have the same eigenvalues........... 7 2.2 Relation between the rank and the eigenvalues of a matrix.. 8 2.3 Eigenvalues are linearly independent.............. 8 2.4 A matrix is diagonalizable if and only if has n linearly independent eigenvalues........................ 9 2.5 Eigenvectors of a symmetric matrix are orthogonal...... 11 3 Matrix diagonalization process example 12 4 Exercises 14 5 R practical 15 5.1 Eigenvalues and Eigenvectors.................. 15 2
1 Matrix diagonalization 1.1 Definitions 1.1.1 Similar matrix Two matrices are called similar if they are related through a third matrix in the following way: A, B M n n (R) are similar if P M n n (R) invertible A = P 1 BP Note that two similar matrices have the same determinant. Proof: Given A, B similar: A = P 1 BP det(a) = det(p 1 BP ) = det(p 1 1 )det(b)det(p ) = det(b)det(p ) = det(b) det(p ) 1.1.2 Diagonizable matrix A matrix is diagonalizable if it is similar to a diagonal matrix, i.e.: A M n n (R) is diagonalizable if P M n n (R) invertible P 1 AP is diagonal P is the matrix of change to a basis where A has a diagonal form. We will say that A is diagonalizable (or diagonalizes) if and only if there is a basis B = {u 1,..., u n } with the property: Au 1 = λ 1 u 1. Au n = λ n u n with λ 1,..., λ n R That is, A has diagonal form in this basis, and consequently, A has diagonal form on every vector of the vector space in this basis. 1.1.3 Eigenvalues and Eigenvectors Let A be a square matrix A M n n (R), λ R is an eigenvalue of A if, for some vector u, Au = λu 3
We can rewrite this as a system of equations: Au = λu (λi n A)u = 0 or (A λi n )u = 0 We can find non-trivial solutons in this homogeneous system of equations if the determinant is zero. 1.2 Calculation of eigenvalues and eigenvectors Let A be a square matrix A M n n (R), λ R is an eigenvalue of A det(λi n A) = 0. A vector u is an eigenvector of λ (λi n A)u = 0. First, we calculate the eigenvalues and afterwards, the eigenvectors. To compute eigenvalues we solve the equation: 0 = det(λi n A) = λ n + α n 1 λ n 1 + + α 2 λ 2 + α 1 λ + α 0 Thus, each eigenvalue λ i is a root of this polynomial (the characteristic polynomial). To compute the eigenvectors, we solve the linear equation for each eigenvalue: (λ i I n A)u = 0 The set of solutions for a given eigenvalue is called the Eigenspace of A corresponding to the eigenvalue λ: E(λ) = {u (λi n A)u = 0} Note that E(λ) is the kernel of a linear map (we leave as exercise to show that λi n A is a linear map): E(λ) = Ker(λI n A) Since the kernel of a linear map is a vector subspace, the eigenspace is a vector subspace. Given a square matrix representing a linear map from a vector space to itself (endomorphism), the eigenvectors describe the subspaces in which the matrix works as a multiplication by a number (the eigenvalues), i.e. the vectors on 4
which matrix diagonalizes. Example in R 2. Consider the matrix in 2 dimensions: ( ) 3 4 A = 1 2 To diagonalize this matrix we write the characteristic equation: ( ) λ + 3 4 det(λi 2 A) = det = (λ + 3)(λ 2) + 4 = 0 1 λ 2 λ 2 + λ 2 = 0 (λ + 2)(λ 1) = 0 λ = 2, 1 The eigenvalues of this matrix are 2 and 1. Now we calculate the eigenvectors for each eigenvalue by solving the homogeneous linear equations for the components of the vectors. For eigenvector λ = 2. ( ) ( ) 2 + 3 4 u1 ( 2I 2 A)u = 0 = 1 2 2 u 2 ( ) ( ) u1 4u 2 0 = u u 1 4u 2 0 1 = 4u 2 ( ) 0 0 Hence, the eigenspace is: { ( ) } a E( 2) = u =, a R a/4 ( ) 1 In particular, u = is an eigenvector with eigenvalue 2. 1/4 For eigenvalue λ = 1. ( ) ( ) ( ) 1 + 3 4 u1 0 (I 2 A)u = 0 = 1 1 2 u 2 0 ( ) ( ) 4u1 4u 2 0 = u u 1 u 2 0 1 = u 2 Hence the eigenspace has the form: 5
{ E(1) = u = ( ) } a, a R a ( ) 1 In particular, u = is an eigenvector with eigenvalue 1. 1 Example in R 3. Consider the following matrix: 5 0 0 A = 3 7 0 4 2 3 λ + 5 0 0 det(λi 3 A) = det 3 λ 7 0 = (λ + 5)(λ 7)(λ 3) = 0 4 2 λ 3 3 solutions: λ = 5, 7, 3 Eigenvector for λ = 5: 0 0 0 x 0 16z 9 ( 5I 3 A)u = 0 3 12 0 y = 0 u = 4 z 9 4 2 8 z 0 z The eigenspace is: x E( 5) = u = y, x = 16 9 z, 4 z, z, z R 9 z Eigenvectors for λ = 7: 12 0 0 x 0 0 (7I 3 A)u = 0 3 0 0 y = 0 u = 2z 4 2 4 z 0 z The eigenspace is: 0 E(7) = u = 2z, z R z 6
Eigenvector for λ = 3: 8 0 0 x 0 0 (3I 3 A)u = 0 3 4 0 y = 0 u = 0 4 2 0 z 0 z The eigenspace is: 0 E(3) = u = 0, z R z 2 Properties of matrix diagonalization In this section we describe some of the properties of diagonalizable matrices. 2.1 Similar matrices have the same eigenvalues Theorem: A, B M n n (R) similar = A, B have the same eigenvalues Proof: Given two square matrices that are similar: A, B M n n (R), A = P 1 BP The eigenvalues are calculated with the characteristic polynomial, that is: det(λi n A) = det(λp 1 P P 1 BP ) = det(p 1 (λi n B)P ) = det(p 1 )det(λi n B)det(P ) = det(λi n B) Hence, two similar matrices have the same characteristic polynomial and therefore will have the same eigenvalues. This result also allows us to understand better the process of diagonalization. The determinant of a diagonal matrix is the product of the elements in its diagonal and the solution of the characteristic polynomials must be of the form (λ λ i ) = 0, where λ i are the eigenvalues. Thus, to diagonalize a matrix is to establish its similarity to a diagonal matrix containing its eigenvalues. 7
2.2 Relation between the rank and the eigenvalues of a matrix Recall that the rank of A matrix is the maximum number of linearly independent row or column vectors. Property: rank(a) = number of different non-zero eigenvalues of A. Proof: we defined a diagonalizable matrix A if it is similar to a diagonal matrix such that D = P 1 AP and D is a diagonal matrix. As we saw in section 1.1.1, the determinant of two similar matrices is the same, therefore: D = P 1 AP det(d) = det(a) We can see that a matrix is singular, i.e. has det(a) = 0, if at least one of its eigenvalues is zero. A the rank of a diagonal matrix is the number of non-zero rows, the rank of A is the number of non-zero eigenvalues. 2.3 Eigenvalues are linearly independent Theorem: the eigenvalues of a matrix are linearly independent. Proof: We prove this by contradiction, i.e. we assume the opposite and arrive to a contradiction. Consider the case of two non-zero eigenvectors for a 2 2 matrix A: u 1 0, u 2 0, Au 1 = λ 1 u 1, Au 2 = λ 2 u 2 We assume that they are linearly dependent: u 1 = cu 2 Now we apply the matrix A on both sides and use the fact that they are eigenvectors: λ 1 u 1 = cλ 2 u 2 = λ 2 u 1 (λ 1 λ 2 )u 1 = 0 As the eigenvalues are generally different, therefore u 1 = 0, which is a contradiction, since we assumed that the eigenvectors are non-zero. Thus, if the eigenvalues are different, the eigenvectors are linearly independent. 8
For n eigenvectors: first, assume linear dependence n u 1 = α j u j Apply the matrix to both sides: n n λ 1 u 1 = α j λ j u j = λ 1 α j u j j=2 j=2 j=2 n (λ 1 λ j )α j u j = 0 α j 0, j For different eigenvalues λ i λ j, i j, necessarily all coefficients α j must be zero. As a result, the eigenvectors of a matrix with maximal rank (non zero eigenvalues) form a basis of the vector space and diagonalize the matrix (see section 2.4). 2.4 A matrix is diagonalizable if and only if has n linearly independent eigenvalues Theorem: A M n n (R) is diagonalizable A has n linearly independent eigenvectors. j=2 Proof: we have to prove both directions. 1. A diagonalizable = n linearly independent eigenvectors. 2. n linearly independent vectors = A diagonalizable. Proof of 1: assume A is diagonalizable. Then, we know it must be similar to a diagonal matrix: P M n n (R) P 1 AP is diagonal We can write: λ 1... 0 P 1 AP = D =..... and P = ( ) p 1... p n 0... λ n 9
P is defined in terms of column vectors p i. We multiply both sides of the equation by P from the left: P 1 AP = D AP = P D AP = P D A ( λ ) ( ) 1... 0 p 1... p n = p1... p n..... 0... λ n This can be rewritten as: ( Ap1... Ap n ) = ( λ1 p 1... λ n p n ) Ap2 = λ i p i This tells us that the column vectors of P, p i, are actually eigenvectors of A. Since the matrix A is diagonalizable, P must be invertible, so the column vectors (i.e. the eigenvectors) p i cannot be linearly dependent of each other, since otherwise det(p ) = 0 Proof of 2: assume that A has n linearly independent eigenvectors. That means p i, i = 1,..., n Ap i = λ i p i (1) We define a matrix P by using p i as the column vectors: P = ( p 1... p n ) We define a diagonal matrix D where the diagonal values are these eigenvalues: λ 1... 0 D =..... 0... λ n We can rewrite the equation 1 in terms of the matrices P and D: Ap i = λ i p i, i = 1,..., n AP = DP D = P 1 AP Since p i are all linearly independent, P 1 exists. A is similar to a diagonal matrix, then, A is diagonalizable. 10
Q.E.D. Conclusion: a matrix is diagonalizable if we can write: A = P DP 1 Where P is the matrix containing the vector columns of eigenvectors P = ( p 1... p n ) And D is the diagonal matrix containing the eigenvalues λ 1... 0 D =..... 0... λ n 2.5 Eigenvectors of a symmetric matrix are orthogonal In general, the eigenvectors of a matrix will be all linearly independent and the matrix diagonalizes when there are enough eigenvectors to form a basis of the vector space where is applied the endomorphism (section 2.4). In general the eigenvectors are not orthogonal. So it is not an orthogonal basis. However, for a symmetric matrix, the corresponding eigenvectors are always orthogonal. Theorem: If v 1,..., v n are eigenvectors for a real symmetric matrix A and if he corresponding eigenvectors are all different, then the eigenvectors corresponding to different eigenvalues are orthogonal to each other. Proof: a symmetric matrix is defined as A T = A. We will calculate the eigenvalues of A and A T and we will proof that they are orthogonal for any pair. Let u be an eigenvector for A T and v be an eigenvector for A. If the corresponding eigenvalues are different, then u and v must be orthogonal: A T u = λ u u, A T u, v = λ u u, v = λ u u, v Av = λ v v A T u, v = ( A T u ) v = u T Av = λ v u T v = λ v u, v (λ u λ v ) u, v = 0 11
If λ u λ v u, v = 0 As the matrix is symmetric, A T = A, so the result is true for any pair of eigenvectors for different eigenvalues of A. Properties used in the proof: v 1 u, v = u T v = ( ) u 1... u n. u, Bv = ( b ) 11... b 1n v 1 u 1... u n...... = u T Bv = (B T u) T v = B T u, v b n1... b nn 3 Matrix diagonalization process example In this section we will perform the whole process to diagonalize a matrix with an example. ( ) 1 2 Example: consider the following matrixa =. Calculate its eigenvalues and eigenvectors and build the matrix P to transform it into a diagonal 2 1 matrix through P 1 AP. We write down the characteristic polynomial: ( ) 1 λ 2 det(a λi 2 ) = det = (1 λ) 2 4 = λ 2 2λ 3 2 1 λ v n v n det(a λi 2 ) = 0 (λ 3)(λ + 1) = 0 λ = 3, 1 It has two solution, i.e. two eigenvalues. We know that two eigenvalues will give two eigenvectors. Hence, at this point we know that the matrix diagonalizes. We now calculate the eigenvectors: ( 1 2 2 1 ) ( ) x = 3 y ( x y ) x + 2y = 3 2x + y = 3y 12 } x = y ( ) x x Eigenvectors of 3
( ) ( ) ( ) 1 2 x x = 1 2 1 y y x + 2y = x 2x + y = y } x = y ( ) x x Eigenvectors of -1 We can also calculate the eigenvectors with the eigenspaces, which is the set of solution of Au = λu: {( ) } a E(3) = Ker(A 3I 2 ) =, a R R 2 a {( ) } b E( 1) = Ker(A + I 2 ) =, b R R 2 b We choose two particular eigenvectors, one from each space: ( ) ( ) 1 1 E(3) E( 1) 1 1 We build the matrix P from these vectors: ( ) 1 1 P = 1 1 Now we need to calculate P 1 and check that P 1 AP is a diagonal matrix with the eigenvalues in the diagonal. P 1 1 = det(p ) Adj(P ) = 1 det(p ) CT P = 1 ( ) 1 1 2 1 1 We confirm that it is the inverse: P 1 P = 1 ( ) ( ) 1 1 1 1 2 1 1 1 1 = ( ) 1 0 0 1 Now we confirm that A is similar to a diagonal matrix through P, and that this diagonal matrix contains the eigenvalues in its diagonal: P 1 AP = 1 ( ) ( ) ( ) ( ) 1 1 1 2 1 1 3 0 = 2 1 1 2 1 1 1 0 1 It is important to know that this would work with a matrix P build from any set of eigenvectors chosen. In addition, the order of the eigenvalues will be the order of the eigenvectors chosen as columns. 13
4 Exercises ( ) 3 2 Ex. 1 Consider the matrix A = 0 1 1. Calculate its eigenvalues and eigenvectors 2. Calculate P such that P 1 AP is diagonal 1 2 1 Ex. 2 Consider the matrix A = 2 0 2 1 2 3 1. Calculate its eigenvalues and eigenvectors 2. Calculate P such that P 1 AP is diagonal Ex. 3 Consider the following linear map between polynomail of degree 1: f : P 1 [x] P 1 [x] Where: a + bx (a + b) + (a + b)x 1. Calculate the associated matrix A 2. Calculate the eigenvalues and eigenvectors associated to this linear map 3. What is the matrix P such that P 1 AP is diagonal 2 1 1 Ex. 4 Consider the matrix A = 0 1 2. Show that A is not 0 0 1 diagonalizable. Hint: you can use the theorem that says that a square matrix of size n is diagonalizable if and only if it has n linearly independent eigenvectors (or n different eigenvalues). 5 1 3 Ex. 5 Consider the matrix A = 0 7 6. Calculate an orthonormal 0 1 8 (orthogonal and unit length) basis for each of its eigenspaces. 14
5 R practical 5.1 Eigenvalues and Eigenvectors Having built a matrix, R has the function eigen() that calculates the eigenvalues and the eigenvectors of a matrix. # Introduce the matrix > m <- matrix (c(-3, -1, 4, 2), 2,2) > m [,1] [,2] [1,] -3 4 [2,] -1 2 # Compute the eigenvalues and eigenvectors > ev <- eigen (m) > evalues <- ev $ values > evalues [1] -2 1 # Eigenvalues are always returned # in decreasing order of the absolute value > evectors <- ev $ vectors > evectors [,1] [,2] [1,] -0.9701425-0.7071068 [2,] -0.2425356-0.7071068 # Returns the eigenvectors as columns # The eigenvectors are unit length # The order is the corresponding # to the order of the eigenvalues. Notice that evectors is a valid matrix P. So computing the inverse of that matrix you can check the diagonalization. > library ( matlib ) > p <- evectors > pi <- inv (p) > pi %*% m %*% p [,1] [,2] [1,] -2.000000e+00-2.220446e-16 [2,] -1.110223e-16 1.000000e+00 15
# The diagonal coincides with the eigenvalues # The other elements are " 0" Try to test the theorems and properties using R (you may need to use commands from previous units). 16
References [1] Howard Anton. Introducción al álgebra lineal. 2003. [2] Marc Peter Deisenroth; A Aldo Faisal and Cheng Soon Ong. Mathematics for Machine Learning. 2018. [3] Michael Friendly. Eigenvalues and eigenvectors: Properties, 2018. [4] Jordi Villà and Pau Rué. Elements of Mathematics: an embarrasignly simple (but practical) introduction to algebra. 2011. 17