Linear algebra 2 Yoav Zemel March 1, 2012 These notes were written by Yoav Zemel. The lecturer, Shmuel Berger, should not be held responsible for any mistake. Any comments are welcome at zamsh7@gmail.com. 1 2 3 Linear transformations in an inner product space Let V be an inner product space and T : V V a linear transformation. We wish to define its conjugate transformation T : V V. The transformation (α, β) T (α), β is the bilinear pattern defined by T. Lemma 3.1 Let α, β, γ, δ in V, and a, b in F, then (1) T (α + γ), β = T (α), β + T (γ), β. (2) T (α), β + δ = T (α), β + T (α), δ. (3) T (aα), β = a T (α), β. (4) T (α), bβ = b T (α), β. Proof. Trivial by linearity of T and the definition of the inner product. Lemma 3.2 Let β, γ in V. (1) If for any α V α, β = α, γ, or for any α V β, α = γ, α then γ = β. (2) Let T and S be linear transformations V V. If for any α, β V T (α), β = S(α), β or for any α α, T (β) = α, S(β) then T = S. (3) If for any α, β V we have T (α), β = 0 or for any α, β V we have α, T (β) = 0 then T = 0. Proof. (1) If for any α α, β γ = 0 then β γ is orthogonal to all vectors, and therefore equals 0. The other case is analogous. 1
(2) If T (α) S(α), β = 0 for any β, then T (α) S(α) = 0. This holds for any α, hence T = S. the other case is analogous. (3) We have T (α), β = 0 = S(α), β, where S is the zero transformation. By (2) T = 0, and the other case is analogous. So T determines a bilinear pattern T (α), β. By this lemma, this linear pattern determines T. If T (α), α = 0 for any α, it does not necessarily mean that T = 0, if F = R e.g. when T is rotation by 90 degrees. However, if F = C then it is true. Proof. For any α, β V we have 0 = T (α + β), α + β = T (α), α + T (α), β + T (β), α + T (β), β = T (α), β + T (β), α. Replacing β by iβ and using i = i we get i T (β), α i T (α), β = 0. On the other hand, we also have i T (β), α + i T (α), β = 0. From this it follows that T (β), α = 0 for any β and any α, and thus T = 0. Let β V and consider the linear functional ϕ β : V R defined by ϕ β (α) = α, β. Note that ϕ β = β, α is linear only over R, not over C. Theorem 3.3 For any linear functional ϕ : V F there exists a unique β for which ϕ = ϕ β. Note : this is true only when dimv <, e.g. V =continuous functions on [-1,1] and the functional ϕ(f) = f(0) does not come from any inner product. Proof. Suppose that ϕ β = ϕ γ then for any α, α, β = α, γ and by a previous lemma γ = β. This establishes uniqueness. For the existence, we consider all the linear functionals ϕ β for β V. This is a subspace of the dual space V. As dimv = dimv, we need to show that n = dimv = dim{ϕ β β V } and this will establish V = {ϕ β β V }. Let S : V V be defined by S(β) = ϕ β. S is linear over R, and almost linear over C : S(β 1 +β 2 )(α) = ϕ β1 +β 2 (α) = α, β 1 + β 2 = α, β 1 + α, β 2 = ϕ β1 (α)+ϕ β2 (α). Thus S(β 1 + β 2 ) = S(β 1 ) + S(β 2 ). But S(cβ)(α) = α, cβ = c α, β = cs(β)(α) cs(β)(α), 2
so S is not necessarily linear. S is injective because if ϕ β = ϕ γ then β = γ. We know that if dimv = n and T : V W is injective then its image T (V ) is a subspace of W with dimension n. The same proof will work here for S, even though it is not linear but almost linear. It follows that Im(S) = V and the existence proof is complete. Theorem 3.4 conjugate transformation Let V be an inner product space of dimension n <. For any linear transformation T : V V there exists a unique linear transformation T : V V, such that α, β T (α), β = α, T (β). T is the conjugate transformation of T. Proof. (uniqueness) If α, β α, T (β) = α, S (β), then T = S by a previous lemma. (existence) Let β V and consider ϕ β (α) = T α, β. By theorem 3 there exists γ V such that for any α, ϕ β (α) = α, γ. Define T (β) = γ, which is well defined by uniqueness of γ. Then α we have T α, β = α, T (β). Do the same procedure for any β to define T over whole V. All that remains is to show that T is linear. Let α, β and γ in V, then α, T (β + γ) = T α, β + γ = T α, β + T α, γ = α, T (β) + α, T (γ) = α, T (β) + T (γ). This holds for any α, and by lemma 2A, T (β + γ) = T (β) + T (γ). For a scalar s F, α, T (sβ) = T α, sβ = s T (α), β = s α, T (β) = α, st (β). This, again, holds for any α and therefore T (sβ) = st (β). T is linear, and the proof is complete. Theorem 3.5 Let T : V V and let T be its conjugate. Let {e 1,..., e n } be an orthonormal basis of V under which A is the matrix that represents T. Then the matrix that represents T is the matrix A defined by A ij = A ji i, j. A is the conjugate matrix of A. Proof. The j th column of A is the coordinate vector of T (e j ) by the given basis. The i th element of it is A ij. We also know that the i th element equals T (e j ), e i. Thus A ij = T (e j ), e i = e i, T (e j ) = T (e i ), e j = A ji. 3
Consider C 2 with the standard inner product, and T (z 1, z 2 ) = (z 1 + iz 2, 2z 2 + (2 3i)z 1 ). Then T (1, 0) = (1, 2 3i) = 1(1, 0)+(2 3i)(0, 1) and T (0, 1) = (i, 2) = i(1, 0)+2(0, 1), so the matrix of T is therefore ( ) 1 i A = 2 3i 2 and A = ( ) 1 2 + 3i. i 2 This gives T (z 1, z 2 ) = (z 1 + (2 + 3i)z 2, 2z 2 iz 1 ). 3.1 Properties of the conjugate transformation Consider the map : hom(v, V ) hom(v, V ). Lemma 3.6 (1) (T + S) = T + S. (2) (at ) = at. (3) (T S) = S T. (4) (T ) = T. (5) The * transformation is bijective. Proof. Rather immediate : For (1), observe that α, (T + S) (β) = (T + S)α, β = T α + Sα, β = T α, β + Sα, β = = α, T β + α, S β = α, T β + S β. As this holds for any α, (T + S) (β) = T (β) + S (β). As this holds for any β, (T + S) = T + S. (2) Similarly, α, (at ) (β) = (at )(α), β = a T α, β = a α, T β = α, at β. As this holds for any α, (at ) (β) = (at )(β). As this holds for any β, (at ) = at. (3) Here we have α, (T S) (β) = (T S)α, β = T (S(α)), β = S(α), T (β) = α, S (T (β)). This holds for any α, so (T S) (β) = S (T (β)). This holds for any β, so that (T S) = S T. (4) Observe that α, (T ) (β) = T (α), β = β, T (α) = T β, α = α, T β. 4
This holds for any α, so (T ) (β) = T (β). This holds for any β, so (T ) = T. (5) If T = S then (T ) = (S ) and by (4) T = S. Thus * is injective. For a given T, T exists, and (T ) = T, so * is also surjective. Since the * operation preserves the relation between matrices and transformations, these properties hold for matrices as well. It is left as an exercise to prove it directly for matrices. Let I be the identity, then I is also the identity matrix, since Iα, β = α, Iβ = α, I β. Definition 3.7 T is conjugate to itself if T = T. When F = R, it is said to be symmetric. When F = C, it is said to be Hermitian. An n n matrix is conjugate to itself if A = A, which happens precisely when A is symmetric if F = R. If F = C then A is said to be Hermitian. If A represents T under an orthonormal basis then A is Hermitian if and only if T is Hermitian. Lemma 3.8 Let T be conjugate to itself and suppose that T α, α = 0 for any α then T = 0. Proof. If F = C then we do not even need T to be conjugate to itself, so it is sufficient to prove the lemma for F = R. One has 0 = T (α + β), α + β = T α, α + T β, β + T α, β + T β, α = = T β, α + T α, β = T α, β + β, T (α) = T α, β + β, T (α) = 2 T α, β, where the last equality is because F = R and the one before that is because T is conjugate to itself. Since T α, β = 0 for any α, β, it follows that T = 0. Theorem 3.9 Let V be a unitary space, then T is conjugate to itself if and only if α V we have T α, α R. This is obviously true only when F = C. Proof. Suppose that T is conjugate to itself. Then for any α we have T α, α = α, T α = α, T α = T α, α, from which it follows that it is indeed a real number. For the converse, reversing the arguments shows that α V α, (T T )(α) = 0, and since F = C it follows that T = T. 5
Remark 1 Obviously, T is conjugate to itself if and only if T is conjugate to itself. Definition 3.10 T is unitary if it preserves the inner product, i.e. for any α, β V we have T α, T β = α, β. In a Euclidean space a unitary transformation is called orthogonal. Theorem 3.11 The following are equivalent (1) T is unitary. (2) T preserves the norm i.e. a = T (a) for any a. (3) T T = I. (4) T maps any orthonormal basis to an orthonormal basis. (5) T maps some orthonormal basis to an orthonormal basis. Remark 2 (2) is equivalent to T preserving lengths i.e. T α T β = T (α β) = α β. (3) means that T and T are invertible, and the inverse of one is the other. Clearly this means T T = I as well. Therefore, if T is unitary then T is as well. Proof. If T is unitary, then T α 2 = T α, T α = α, α = α 2, so T preserves the norm. If T preserves the norm, then α, T T α = T α, T α = α, α. Thus 0 = α, T T α α = α, (T T I)α = (T T I)α, α. The proof now follows since T T I is conjugate to itself : (T T I) = T T I. If T T = I and {e 1,..., e n } is an orthonormal basis for V, then e i, e j = δ ij. As T is invertible, {T e 1,..., T e n } has to be a basis. It remains to show that it is orthonormal. T e i, T e j = e i, T T e j = e i, e j = δ ij, so {T e 1,..., T e n } is an orthonormal basis. (4) implies (5), since orthonormal bases exist. Suppose that {e 1,..., e n } and {T e 1,..., e n } are orthonormal bases. Then for α = i a ie i and β = j b je j we have T α, T β = ai T e i, b j T e j = a i T e i, b j T e j = = i a i b j e i, e j = a i b i = α, β. j Definition 3.12 A matrix A, is said to be unitary (orthogonal, if F = R) if A A = I (A t A = I). 6
Corollary 3.13 If A represents T with respect to an orthonormal basis, then T is unitary if and only if A is unitary. Proof. If T is unitary then T T = I, and as the basis is orthonormal, A represents T, and thus A A represents I, which gives A A = I. If T is not unitary then T T I and A will not be unitary. ( ) ( ) cos a sin a 1 0 Example : is an orthogonal matrix over R, as well as. sin a cos a 0 1 A product of unitary matrices is also unitary : (T S) T S = S T T S = S IS = S S = I. Proposition 3.14 A is unitary if and only if there exists an inner product space V over F such that A is the transformation matrix of a transformation mapping an orthonoraml basis of V to an orthonormal basis of V. Proof. Let A be unitary, set V = F n with the standard inner product, and define T : V V by T α = Aα. Then T is a unitary linear transformation. Let k = {e 1,..., e n } be the standard basis of V. Thus k 2 = {T e 1,..., T e n } is an orthonormal basis as well. A is the transformation matrix from k to k 2. For the converse, we use the isomorphism that maps a vector to its coordinates vector under the orthonormal basis k = {e 1,..., e n }. Now, suppose that k 2 = {d 1,..., d n } and k are both orthonormal, and A is the transformation matrix from k to k 2. Therefore T as defined above maps k to k 2 ; it is therefore unitary. A represents it under an orthonormal basis, so it is also unitary. Theorem 3.15 Let A M n (F ) and consider its rows as vectors α 1,..., α n F n. Then A is unitary if and only if these rows form an orthonormal basis. Proof. Write A A = (b ij ), then A is unitary if and only if b ij = δ ij. On the other hand, α i, α j = k a ika jk = k a ika kj = b ij. We need α i, α j to equal δ ij and that happens if and only if A is unitary. It follows immediately that A is unitary if and only if its columns are orthonormal basis. We wish to discover when V has an orthonormal basis of eigenvectors of T. Claim 1 If such a basis exists, then T T = T T. Proof. If {e 1,..., e n } is an orthonormal basis of eigenvectors of T then the matrix A which represents T under this basis is diagonal. A is diagonal as well, and represents T. AA represents T T, and A A represents T T, but AA = A A because both are diagonal. Thus T T = T T. 7
Definition 3.16 A linear transformation T is said to be normal if T T = T T. Examples : If T is conjugate to itself then it is normal. If it is unitary then it is normal. Lemma 3.17 Let V inner product space over F, T : V V normal. If c is eigenvalue of T with eigenvector u, then c is an eigenvalue of T with the same u. Proof. As T is normal, for any w V we have T w 2 = T w, T w = w, T T w = w, T T w = T w, T w = T w 2. We know that T u = cu and wish to show that T u = cu. Consider T ci. As T is normal, T ci is also normal : (T ci)(t ci) = (T ci)(t ci) = T T ct ct + cci, and (T ci) (T ci) = T T ct ct + cci, which are equal as T and T commute. Thus T ci is normal. It follows that 0 = T u cu = (T ci)u = (T ci) u = (T ci)u = T u cu. This gives T u = cu. Corollary 3.18 If T is normal, then any real eigenvalue of T is an eigenvalue of T. Lemma 3.19 Let T be normal, then any two eigenvectors lying in different eigenspaces are orthogonal. Proof. Suppose that u belong to a and w to b a, then a u, w = au, w = T u, w = u, T w = u, bw = b u, w, in virtue of the previous lemma. As a b this means u, w = 0. Lemma 3.20 Let V be an inner product space over C, with 0 < dimv = n <. Then for any linear transformation T : V V there is an eigenvector u 0. Proof. The (not identically zero) characteristic polynomial splits to linear components and they are all eigenvalues thus there are nonzero eigenvectors. Theorem 3.21 Let V be a unitary space over C and T : V V be normal. Then V admits an orthonormal basis of eigenvectors of T. 8
Proof. Let U be the span of all the eigenvectors. This is a subspace of V. Let a 1,..., a k be the eigenvalues of T and V i U the eigenspaces. For any V i we choose an orthonormal basis and define E to be the union of theses bases. Since all the eigenvectors are in E we have U = V i = span(e). Write E = {u 1,..., u l }, where l k, then E spans U. We have to show that E is independent and that l = n. If i = j, then u i, u j = 1 because u i is an elements of an orthonormal basis of some space. If i j and u i, u j are in the same eigenspace V r, then u i, u j = 0 because they are part of an orthonormal basis of V r. If i j and they lie in different eigenspaces, then u i, u j = 0 by a previous lemma. This proves that E is an orthonormal system, thus independent. To see that l = n, write V = U U +, where U + is the orthogonal complement of U. We wish to show that U + = {0}, and we begin by showing that it is T invariant, i.e. T (U + ) U +. Let β U +, then u j, β = 0 for any j. But u j, T (β) = T (u j ), β = xu j, β = x u j, β = 0, where x is the eigenvalue to whose space u j belongs. Thus T (β) U +. Define S : U + U + by S(α) = T (α). If there is α 0, then by a previous lemma S has an eigenvector. It will also be an eigenvector of T, so must lie in U, and so the intersection of U and U + is not {0}, a contradiction. This completes the proof. Corollary 3.22 Let T : V V as before, then V has an orthonormal basis under which T is diagonal. Corollary 3.23 For a normal A M n (C), there is a unitary U M n (C) with U AU = U 1 AU diagonal. In order to find U, one can proceed as follows : (1) calculate the characteristic polynomial det(xi A). (2) find the eigenvalues of A, i.e. the roots of the polynomial. (3) for any eigenvalue find a basis for the eigenspace. (4) for any eigenspace find an orthornormal basis (use Gram Schimdt process). (5) the union of those bases is an orthonormal basis of eigenvectors of V. If the columns of U are (α 1,..., α n ), then the similar matrix is l 1 l 2..., l n 9
where l i is the eigenvalue of α i (l 1,..., l n do not have to be distinct). Theorem 3.24 Let V be an inner product space over C or R, T : V V normal. Then (1) T is conjugate to itself if and only if all the eigenvalues of T (in C) are real numbers. (2) T is unitary if and only if all eigenvalues have an absolute value of 1. ( ) 0 1 Example : is unitary (its rows are orthonormal), and has no real 1 0 eigenvalues. But over C is has the eigenvalues i and i, both have a modulus of 1. Proof. Let {e 1,..., e n } be an orthorormal basis of V under which the matrix D of T is diagonal. D is clearly normal. (1) T is conjugate to itself if and only if D = D, which is equivalent to D having only real eigenvalues, and they are the same eigenvalues of T. (2) T is unitary if and only if D is unitary if and only if DD = I if and only if l i l i = 1 i if and only if li = 1 i. Corollary 3.25 Let A be a symmetric matrix over R then A is diagonalizable over R. Proof. A is symmetric thus conjugate to itself thus normal. Therefore all the eigenvalues are real, and the characteristic polynomial decomposes to linear elements. As A is normal it can be diagonalized by a unitary matrix : for any eigenvalue we look at the eigenspace and find an orthonormal basis. Since A and its eigenvalues are real, we can choose the elements of this basis to be real as well. The union of these bases is an orthonormal basis of V. The matrix U whose columns are the vectors of the basis is orthogonal and U 1 AU is diagonal. Corollary 3.26 Let V be a unitary space with dimv = n. Let T : V V, then T is normal if and only if there exist a unitary matrix U and a Hermitian matrix H such that HU = UH = T. Proof. If such H and U exist then HU is normal, because (H = H commutes with U) HU(HU) = HUU H = HH = H 2 = U UHH = U HHU = U H HU = (HU) HU. If T is normal, let {e 1,..., e n } be an orthonormal basis of V under which D, the matrix that represents T, is diagonal. Write l 1 l 2 D =..., l j = r j (cos t j + i sin t j ), r j R +. l n 10
Set r 1 H r 2 = cos t 1 + i sin t 1..., U cos t 2 + i sin t 2 = r n.... cos t n + i sin t n Then : H is Hermitian because it is diagonal, and all its eigenvalues are real. U is normal because it is diagonal. It is unitary because all its eigenvalues have a modulus of 1. Clearly H U = U H = D. Now let H, and U be the linear transformations whose matrices under {e 1,..., e n } are H and U respectively. Then HU = UH = T. 4 Bilinear patterns Definition 4.1 Let W and V be vector space over a field F. A bilinear pattern is a function f : V W F with the following properties : f(a + c, b) = f(a, b) + f(c, b), f(a, b + c) = f(a, b) + f(a, c), f(au, v) = f(u, av) = af(u, v). Equivalently, for any v V and w W, f(v, ) : W F and f(, w) : V F are linear functionals. Examples : f(v, w) = 0 for any w, v is a bilinear pattern. If V is a Euclidean space over over R then, : V V F is bilinear. If V is unitary,, is not bilinear because it is not homogeneous unless V = {0}. If V is the dual space of V then f : V V F defined by f(v, w) = w(v) is a bilinear pattern. Fix g W and h V. Then f(w, v) = g(w)h(v) is also a bilinear pattern. Let V = F m, W = F n and A an m n matrix. Then A defines a bilinear pattern on V W by f(v, w) = v t Aw where v t is the transpose of v. This example is in fact the most general, once we fix bases for V and W : Suppose that dimv = m and dimw = n. Let (v 1,..., v m ) and (w 1,..., w n ) be bases. For a bilinear f define the matrix A by A ij = f(v i, w j ). 11
Write v = m i=1 x iv i, w = n i=1 y iw i then ( f(v, w) = f x i v i, ) y i w i i j = i,j x i y j f(vi, wj) = = i,j x i y j A ij = (x 1,..., x m )A(y 1,..., y n ) t = v t Aw, where v t = (x 1,..., x m ) and w t = (y 1,..., y n ). Given these two bases, f determines A and A determines f, so the map f A is bijective. If W = V and A = I under the standard basis we get f(u, v) = u t v. Suppose that dimv = m and dimw = n. multiplication by Define addition and scalar (f + g)(a, b) = f(a, b) + g(a, b) and (cf)(a, b) = cf(a, b). Then the collection of bilinear patterns is a vector space, of dimension mn by the isomorphism f A. One can also show it without using matrices : Lemma 4.2 Let V and W be vector space of finite dimension, and {g 1,..., g m }, {h 1,..., h n } bases of V and W respectively. Then g i h j is a bilinear pattern. These mn products give a basis for the vector space of bilinear patterns. Theorem 4.3 Let f be a bilinear pattern on V W and consider its matrix A with respect to the bases {w 1,..., w n } and {v 1,..., v m }. Let P be the transformation matrix from {v 1,..., v m } to {v 1,..., v m} and Q the transformation matrix from {w 1,..., w n } to {w 1,..., w n} then the matrix of f with respect to these new bases is P t AQ. Proof. Let v V, w W, and write their coordinates under {v 1,..., v m} and {w 1,..., w n} as v = (x 1,..., x m ) and w = (y 1,..., y n ). Then the coordinates of v under {v 1,..., v m } are P (x 1,..., x m ), and the coordinates of w under {w 1,..., w n } are Q(y 1,..., y n ), and so f(v, w) = (P (x 1,..., x m )) t AQ(y 1,..., y n ) = (x 1,..., x m ) t (P t AQ)(y 1,..., y n ), so the matrix by the new bases is indeed P t AQ. If V = W, then we call f a linear pattern on V. In such cases, we use the same basis for both vectors. Thus, if the matrix of f under {v 1,..., v n } is A, and under {v 1,..., v n} it is A, then A = P t AP, where P is the transformation matrix from {v 1,..., v n } to {v 1,..., v n}. 12
Definition 4.4 Two matrices A and B are congruent if there exists an invertible matrix P such that B = P t AP. This defines an equivalence relation, since A = I t AI, and if B = P t AP then A = (P t ) 1BP 1 = (P 1) t BP 1. If C = Q t BQ then C = Q t BQ = Q t P t AP Q = (P Q) t A(P Q), and P Q is invertible. Lemma 4.5 Two n n matrices are congruent if and only if they represent the same bilinear pattern over V. Proof. Immediate. Theorem 4.6 Let f be a bilinear pattern on V W with dimv = m and dimw = n. Choose bases {v 1,..., v m } and {w 1,..., w n } and 0 r min{m, n}. { 1 i = j r (1) f(v i, w j ) =. This gives something of the sort of a diagonal 0 otherwise matrix. 13