Lecture Notes: Orthogonal and Symmetric Matrices Yufei Tao Department of Computer Science and Engineering Chinese University of Hong Kong taoyf@cse.cuhk.edu.hk Orthogonal Matrix Definition. Let u = [u i ] and v = [v i ] be two n vectors. Define the dot product between them denoted as u v as the real value n i= u iv i. Similarly, let u = [u j ] and v = [v j ] be two n vectors. Define the dot product between them again denoted as u v as the real value n j= u jv j. Definition. Let u be a vector. Define u = u u, and call it the norm or the length of u. Definition 3. Let S be a set of non-zero (column or row) vectors v, v,..., v k of the same dimensionality. We say that S is orthogonal if v i v j = for any i j. Furthermore, we say that S is orthonormal if (i) S is orthogonal, and (ii) v i = for any i [, k]. For example,,, is orthogonal but not orthonormal. If, however, we scale each of the above vectors to have length, then the resulting vector set becomes orthonormal: / /, / 6 / 6 / 6, / 3 / 3 / 3 Lemma. An orthogonal set of vectors must be linearly independent. Proof. Suppose that S = {v, v,..., v k }. Assume, on the contrary, that S is not linearly independent. Hence, there exist real values c, c,..., c k that are not all zero, and make the following hold: c v + c v +... + c k v k =. Suppose, without loss of generality, that c i for some i [, k]. Then, we multiply both sides of the above equation by v i, and obtain: c v v i + c v v i +... + c k v k v i = c i v i v i =. The above equation contradicts the fact that c i and v i is a non-zero vector.
Definition 4. An n n matrix A is orthogonal if (i) its inverse A exists, and (ii) A T = A. Example. Consider A = [ cos θ sin θ sin θ cos θ [ cos θ sin θ ] sin θ cos θ. The following is a 3 3 orthogonal matrix: /3 /3 /3 /3 /3 /3 /3 /3 /3 ]. It is orthogonal because A T = A = An orthogonal matrix must be formed by an orthonormal set of vectors: Lemma. Let A be an n n matrix with row vectors r, r,..., r n, and column vectors c, c,..., c n. Both the following statements are true: A is orthogonal if and only if {r, r,..., r n } is orthonormal. A is orthogonal if and only if {c, c,..., c n } is orthonormal. Proof. We will prove only the first statement because applying the same argument on A T proves the second. Let B = AA T. Denote by b ij the element of B at the i-th row and j-th column. We know that b ij = r i r j (note that the j-th column of A T has the same components as r j ). A is orthogonal if and only if B is an identity matrix, which in turn is true if and only if b ij = when i = j, and b ij = otherwise. The lemma thus follows. Symmetric Matrix Recall that an n n matrix A is symmetric if A = A T. In this section, we will learn several nice properties of such matrices. Lemma 3. All the eigenvalues of a symmetric matrix must be real values (i.e., they cannot be complex numbers). All eigenvectors of the matrix must contain only real values. We omit the proof of the lemma (which is not difficult, but requires the definition of matrices on complex numbers). Note that the above lemma is not true for general square matrices (i.e., it is possible for an eigenvalue to be a complex number). Lemma 4. Let λ and λ be two different eigenvalues of a symmetric matrix A. Also, suppose that x is an eigenvector of A corresponding to λ, and x is an eigenvector of A corresponding to λ. It must holds that x x =. Proof. By definition of eigenvalue and eigenvector, we know: Ax = λ x () Ax = λ x ()
From (), we have x T A T = T λ x x T A = T λ x x T Ax = λ x T x (by ()) x T λ x = λ x T x x T x (λ λ ) = (by λ λ ) x T x =. The lemma then follows from the fact that x x = x T x. Example. Consider A = We know that A has two eigenvalues λ = and λ =. For eigenvalue λ =, all the eigenvectors can be represented as x = x x x 3 satisfying: with u, v R. eigenvectors: x = v u, x = u, x 3 = v Setting (u, v) to (, ) and (, ) respectively gives us two linearly independent x =, x = For eigenvalue λ =, all the eigenvectors can be represented as x = x = t, x = t, x 3 = t with t R. Setting t = gives us another eigenvector: x 3 = x x x 3 satisfying: Vectors x, x, and x 3 are linearly independent. According to Lemma 4, both x x 3 and x x 3 must be, which is indeed the case. From an earlier lecture, we already know that every symmetric matrix can be diagonalized because it definitely has n linearly independent eigenvectors. The next lemma strengthens this fact: 3
Lemma 5. Every n n symmetric matrix has an orthogonal set of n eigenvectors. We omit the proof of the lemma (which is rather non-trivial). Note that n eigenvectors in the lemma must be linearly independent, according to Lemma. Example 3. Let us consider again the matrix A in Example. We have obtained eigenvectors x, x, x 3. Clearly, they do not constitute an orthogonal set because x, x are not orthogonal. We will replace x with a different x that is still an eigenvector of A for eigenvalue λ =, and is orthogonal to x. From Example, we know that all eigenvectors corresponding to λ have the form For such a vector to be orthogonal to x =, we need: ()(v u) + u = v = u As you can see, there are infinitely many such vectors, any of which can be x except produce one, we can choose u =, v =, which gives x = {x, x, x 3} is thus an orthogonal set of eigenvectors of A.. v u u v.. To Corollary. Every n n symmetric matrix has an orthonormal set of n eigenvectors. Proof. The orthonormal set can be obtained by scaling all vectors in the orthogonal set of Lemma 5 to have length. Now we prove an important lemma about symmetric matrices. Lemma 6. Let A be an n n symmetric matrix. There exist an orthogonal matrix Q such that A = Q diag[λ, λ,..., λ n ] Q, where λ, λ,..., λ n are eigenvalues of A. Proof. From an earlier lecture, we know that given a set of linearly independent eigenvectors v, v,..., v n corresponding to eigenvalues λ, λ,..., λ n respectively, we can produce Q by placing v i as the i-th column of Q, for each i [, n], such that A = Q diag[λ, λ,..., λ n ] Q. From Corollary, we know that we can find an orthonormal set of v, v,..., v n. By Lemma, it follows that Q is an orthogonal matrix. Example 4. Consider once again the matrix A in Example. In Example 3, we have obtained an orthogonal set of eigenvectors:,, 4
By scaling, we obtain the following orthonormal set of eigenvectors: / /, / 6 / 6 / 6, / 3 / 3 / 3 Recall that these eigenvectors correspond to eigenvalues,, and, respectively. We thus produce: Q = such that A = Q diag[,, ] Q. / / 6 / 3 / / 6 / 3 / 6 / 3 5