Linear Algebra II. Ulrike Tillmann. January 4, PDF Free Download

Linear Algebra II Ulrike Tillmann January 4, 208 This course is a continuation of Linear Algebra I and will foreshadow much of what will be discussed in more detail in the Linear Algebra course in Part A We will also revisit some concepts seen in Geometry though material from that course is not assumed to have been seen In this course we will deepen our understanding of matrices and linear maps more generally In particular we will see that often a good choice of basis makes the transformation easy to understand in geometric terms One of our key tools is the determinat which we will study first These lectures are a brief path through the essential material Much will be gained by studying text books along the way One book that also covers much of the material of the Part A course is Linear Algebra by Kaye and Wilson, another that can be found in many college libraries is Linear Algebra by Morris These notes are essentially due to Alan Lauder

Determinants Existence and uniqueness Let M n (R be the set of n n matrices with real entries For A M n (R it will be convenient in this section and occasionally elsewhere to write where a i ( i n are the columns A = [a,, a n ] Definition A mapping D : M n (R R is determinantal if it is (a multilinear in the columns: D[, b i + c i, ] = D[, b i, ] + D[, c i, ] D[, λa i, ] = λd[, a i, ] for λ R (b alternating: D[, a i, a i+, ] = 0 when a i = a i+ (c and D(I n = for I n the n n identity matrix When proving the existence of determinantal maps, it is easier to define the alternating property as above However, when showing uniqueness we shall use the following at first glance stronger alternating properties Proposition 2 Let D : M n (R R be a determinantal map Then ( D[, a i, a i+ ] = D[, a i+, a i, ] (2 D[, a i,, a j ] = 0 when a i = a j, i j (3 D[, a i,, a j ] = D[, a j,, a i, ] when i j Of course the third part subsumes the first, but it is easier to prove the proposition in three steps Proof ( Thinking of D as a multilinear alternating map on the i and (i + th columns only, we have 0 = D[a i + a i+, a i + a i+ ] = D[a i, a i ] + D[a i, a i+ ] + D[a i+, a i ] + D[a i+, a i+ ] from which the first claim follows = 0 + D[a i, a i+ ] + D[a i+, a i ] + 0 2

(2 For the second, given a matrix A in which a i = a j with i j, we can switch adjacent columns and apply the first part to see that D(A agrees up to sign with D(A where the matrix A has two identical adjacent columns But then D(A = 0 (3 Finally, the third part now follows from the second, by applying the same argument that we used originally to prove the first part! Theorem 3 A determinantal map D exists Proof We prove this by induction on n For n = define D((λ := λ, which has the right properties Assume we have proved the existence of a determinantal map D for dimension n where n 2 Let A = (a ij M n (R Write A ij for the (n (n matrix obtained from A by deleting the ith row and jth column Fix i with i n Define D(A := ( i+ a i D(A i + ( i+n a in D(A in ( Here the D( on the righthand side is our determinantal function on (n (n matrices, already defined by induction We show D is determinantal on n n matrices View D as a function of the kth column, and consider any term ( i+j a ij D(A ij If j k then a ij does not depend on the kth column and D(A ij depends linearly on the kth column If j = k the a ij depends linearly on the kth column, and D(A ij does not depend on the kth column In any case our term depends linearly on the kth column Since D(A is the sum of such terms, it depends linearly on the kth column and so is multilinear Next, suppose two adjacent columns of A are equal, say a k = a k+ Let j be an index with j k, k + Then A ij has two adjacent equal columns, and hence D(A ij = 0 So we find D(A = ( i+k a ik D(A ik + ( i+k+ a i,k+ D(A i,k+ Now A ik = A i,k+ and a i,k = a i,k+ since a k = a k+ So these two terms cancel and D(A = 0 Finally D(I n = directly from the inductive definition To show uniqueness, let s first look at the case n = 2 Example 4 For any determinantal D : M 2 (R R we have ( [ ( ( ( a b 0 D = D a + c, b c d 0 0 ( 0 + d ] 3

( ab D 0 0 So this function D is unique ( 0 + ad D 0 ( 0 + cb D 0 = ab 0 + ad + cb ( + cd 0 = ad bc ( 0 0 + cd D The proof for general n is essentially the same, only more complicated to write down and we will need first a definition Definition 5 Let n N A permutation σ is a bijective map from the set {, 2,, n} to itself The set of all such permutations is denoted S n An element σ S n which switches two elements i < j n and fixes the others is called a transposition It is intuitively obvious (and proved in Groups and Group Actions that every permutation can be written (not uniquely as a sequence of transpositions So let D : M n (R R be some determinantal map For A = (a ij = [a,, a n ] M n (R write a = a e + a n e n a n = a n e + a nn e n where e i is the n vector with in the ith position and zero elsewhere Then by multilinearity and using the second alternating property in Proposition 2 we have D[a,, a n ] = σ a σ(, a σ(n,n D[e σ(,, e σ(n ] Here the sum is over S n the main point being as in Example 4 determinants on matrices with two equal columns vanish Now write σ as a product of t, say, transpositions and unshuffle the columns in [e σ(,, e σ(n ] keeping track of the effect on D using the third alternating property in Proposition 2 We find 2 D[e σ(,, e σ(n ] = ( t D[e,, e n ] = ( t D(I n = ( t Observe that the value ( t must be independent of how one wrote σ as a product of transpositions: it is called the sign of σ and written sign(σ So we find D[a,, a n ] = σ S n sign(σa σ(, a σ(n,n (2 But this equation gives D explicitly as a multivariable polynomial in the entries a ij, and so shows D is unique We have proved: Imagine a row of children s blocks with the numbers to n on them, but in some random order: you can line them up in the correct order by using your hands to switch two at a time 2 The matrix M σ := [e σ(,, e σ(n ] is a permutation matrix, so-called because Me j = e σ(j ; that is, it permutes the basis vectors by acting by σ on the indices We won t use this term again, but it appears in Groups and Groups in Action 4

Theorem 6 For each n N there exists a unique determinantal function D : M n (R R and it is given explicitly by the expansion (2 We write this unique function as det( or sometimes Note that det satisfies equation (, since it is the unique determinantal function we say here we are computing det by expanding along the ith row (Laplace expansion Example 7 n = 2: Here S 2 = {, ( 2} where ( 2 denotes the map switching and 2, so sign(( 2 = ( a a det 2 = sign(σa a 2 a σ(, a σ(2,2 = a a 22 a 2 a 2 22 σ S 2 n = 3: Using the Laplace expansion along the first row we find a a 2 a 3 a 2 a 22 a 23 a 3 a 32 a 33 = a a 22 a 23 a 32 a 33 a 2 a 2 a 23 a 3 a 33 + a 3 a 2 a 22 a 3 a 32 These formulae, which you may have seen before, are useful for computations 2 Basic properties Now some basis properties of the determinant Lemma 8 For σ S n, we have sign(σ = sign(σ (Note σ is a bijection so has an inverse Proof Follows since σ σ is the identify map, which can be written as a sequence of 0 transpositions, an even number Proposition 9 det(a = det(a T Proof Follows from the expansion formula (2, Lemma 8, and the fact that as σ varies over S n so does σ : det(a T = sign(σa,σ( a n,σ(n = sign(σa σ (, a σ (n,n σ S n σ S n sign(σ a σ (, a σ (n,n = det(a σ S n 5

Corollary 0 The map det : M n (R R is multilinear and alternating in the rows of a matrix (Our discussion in terms of columns though is notationally simpler Corollary One has det(a = σ S n sign(σa,σ( a n,σ(n 3 Geometric interpretation The explicit form of det given in (2 and Corollary is useful for computations in small dimensions and some proofs (eg Proposition 29, but on the whole rather unenlightening Rather, it is the axiomatic characterisation of det as the unique map satisfying the properties in Definition which gives it an intuitive geometric meaning for real matrices Writing A = [a,, a n ] M n (R we have that the absolute value of det(a is the n-dimensional volume of the parallelepiped spanned by the vectors a,, a n To see why, it is perhaps most instructive to consider how the properties in Definition fit exactly in the case of R 2 with your intuitive idea of how the area of a parallelogram should behave, eg, under summing two parallelograms with a common side, or scaling a side 4 Multiplicativity We now prove the key properties of the determinant Theorem 2 Let A, B M n (R Then (i det(a 0 A is invertible (ii det(ab = det(a det(b There are various ways to go about this We give a proof which is not the most concise, but shows how one in practice actually goes about computing determinants once matrices get reasonably larger; that is, using row operations Recall there are three types of elementary row operations (EROs: (i Multiplying the ith row by λ 0 (ii Swapping rows i and j 6

(iii Adding µ R times row j to row i Each of these is accomplished by pre-multiplying A be a suitable elementary matrix E which, for example by the alternating and multilinear properties of det in the rows (or the expansion formula, have determinant λ, and respectively Lemma 3 Let A M n (R For such an elementary matrix E we have det(ea = det(e det(a Proof We consider the three possible types for E and use Corollary 0 (i The result follows immediately by the multilinearity in the rows of det (ii The result follows from the third alternating property (for the rows this time in Proposition 2 (iii This follows from multilinearity and the second alternating properties (for rows in Proposition 2 Precisely, by multilinearity det(ea = det(a + µ det(b where the i and jth rows of B are both (a j,, a jn, so det(b = 0 From Linear Algebra I (205 Theorem 22 or 206 Section 5 we know that there exist elementary matrices E,, E k such that { In when A is invertible E k E k E A = A otherwise where A is some matrix with a zero row Note that det(a = 0 since we can, for example, compute det(a by expanding along a zero row, using formula ( So by Lemma 3 { when A is invertible det(e k det(e det(a = 0 otherwise Now det(e k det(e 0 so we find det(a 0 A is invertible proving Theorem 2 Part (i Moreover when det(a 0 one has ( k det(a = det(e i (3 i= We now prove Part (ii First note det(ab = 0 AB is not invertible (by Part (i 7

A is not invertible, or B is not invertible det(a = 0 or det(b = 0 (by Part (i (The implication (AB invertible A and B are both invertible here is not completely obvious: if AB is invertible then certainly the map defined by A is surjective and that by B is injective Now apply the rank-nullity theorem This proves Part (ii when det(a = 0 or det(b = 0 So we can assume that det(a, det(b 0 There exist elementary matrices E i and F j such that E k E A = I n F l F B = I n and so Thus by Lemma 3 we find F l F (E k E AB = I n det(f i det(e j det(ab = i j and hence by (3 ( det(ab = det(f i i j det(e j = det(a det(b Example 4 Usually it is better to compute determinants of matrices when n > 3 using row operations Writing A for det(a: 2 3 4 4 9 6 = 0 2 3 0 3 8 5 = 0 2 3 0 0 2 6 8 27 64 0 7 26 63 0 0 2 42 = 2 0 2 3 0 0 3 = 2 0 2 3 0 0 3 = 2 6 = 2 0 0 2 42 0 0 0 6 Observe here that by expanding successive down the first column one sees that the determinant of an upper triangular matrix is the product of its diagonal entries 5 Determinant of a linear transformation Let V be a vector space of dimension n over R Definition 5 Let T : V V be a linear transformation, B a basis for V, and MB B (T the matrix for T with respect to initial and final basis B We define det(t := det(m B B (T 8

Proposition 6 The determinant of T is independent of the choice of basis B Proof Let B be another basis, write A = MB B B (T and C = MB (T, We need to show det(a = det(c Let P = M B B (Id V be the change of basis matrix By Linear Algebra I (205 Theorem 66 or 206 Theorem 62 we have I n = M B B (Id V = M B B (Id V Id V = M B B (Id V M B B (Id V and so P = M B B (Id V By in Linear Algebra (205 Theorem 69 or 206 Theorem 65 we get M B B (T = M B B (Id V M B B (T M B B (Id V, that is, Hence by Theorem 2 Part (ii C = P AP det(c = det(p det(a det(p = det(p det(p det(a = det(p P det(a = det(i n det(a = det(a Note the useful fact det(p = det(p for an invertible matrix P Theorem 7 Let S, T : V V be linear transformations Then (i det(t 0 T is invertible (ii det(st = det(s det(t Proof Immediate from Theorem 2 9

2 Eigenvectors and eigenvalues In this section the field R may be replaced by any other field F, for example C Note though a matrix over R often acquires more eigenvalues and eigenvectors when one thinks of it as being defined over C, and likewise for linear maps 2 Definitions and basic properties Let V be a vector space over R and T : V V be a linear transformation Definition 2 A vector v V is called an eigenvector of T if v 0 and T v = λv for some λ R We call λ R an eigenvalue of T if T v = λv for some nonzero v V From now on we assume V is finite dimensional Example 22 Let V = R 3, and T be rotation by an angle θ about an axis through the origin If v 0 lies on this axis then T v = v so it is an eigenvector with eigenvalue There are no other eigenvalues unless θ = 80 o in which case is an eigenvalue and all nonzero vectors lying in the plane perpendicular to the axis are eigenvectors Proposition 23 λ is an eigenvalue of T Ker(T λi {0} Proof λ is an eigenvalue of T v V, v 0, T v = λv v V, v 0, (T λiv = 0 Ker(T λi {0} Corollary 24 The following statements are equivalent (a λ is an eigenvalue of T (b Ker(T λi {0} (c T λi is not invertible (d det(t λi = 0 Proof (a (b was shown above (c (d follows from Theorem 7 Part (i (b (c is true since by the Rank-Nullity theorem T λi is invertible if and only if its nullity is zero 0

The equivalence (a (d is the key one here and motivates the following definition Definition 25 For A M n (R the characteristic polynomial of A is defined as det(a xi n For T : V V a linear transformation, let A be the matrix for T with respect to some basis B The characteristic polynomial of T is defined as det(a xi n Here the determinants are defined by taking the field in Section to be R(x That the characteristic polynomial is well-defined for a linear map independent of the choice of basis is proved in exactly the same manner as in Proposition 6, using the equality P (A xi n P = P AP xi n We denote the characteristic polynomial of T by χ T (x, and of a matrix A by χ A (x Theorem 26 Let T : V V be a linear transformation Then λ is an eigenvalue of T if and only if λ is a root of the characteristic polynomial χ T (x of T Proof ( Suppose λ is an eigenvalue of T Then by Corollary 24 implication (a (d, we have det(t λ = 0 Thus det(a λi n = 0 for any matrix A for T (If A is a matrix for T, then A λi n is the corresponding one for T λi So λ is a root of χ T (x = det(a xi n ( Suppose λ is a root of χ T (x = det(a xi n for some matrix (all matrices A for T Then det(a λi n = 0, and so det(t λi = 0 Thus by Corollary 24 implication (d (a, λ is an eigenvalue of T Given a matrix A M n (R one defines eigenvalues λ R and eigenvectors v R n (column vectors exactly as in Definition 2, taking T to be the linear map on V = R n associated to A, and then Proposition 23, Corollary 24 and Theorem 26 hold with T replaced by A Example 27 Continuing Example 22, if we take a basis v, v 2, v 3 where v lies on the axis of rotation and v 2 and v 3 are perpendicular vectors of equal length spanning the plane through the origin perpendicular to the axis, then the matrix is 0 0 A = 0 cos θ sin θ 0 sin θ cos θ which has characteristic polynomial x 0 0 0 cos θ x sin θ 0 sin θ cos θ x = ( x ( (cos θ 2 2 cos θx + x 2 + (sin θ 2 = ( x(x 2 2 cos θx+ So the eigenvalues over C are λ = and 2 cos θ ± 4(cos θ 2 4 2 = cos θ ± sin θ,

these latter only being real when θ = 0 (λ = or 80 o (λ = So Theorem 26 agrees with our geometric intuition For A = (a ij M n (R recall the trace tr(a is defined to the sum n i= a ii of the diagonal entries, and that tr(ab = tr(ba for A, B M n (R (Linear Algebra I 205 Sheet 2 Question 3 Definition 28 For T : V V a linear transformation the trace tr(t is defined to be tr(a where A is any matrix for T That this is well-defined follows since (using notation from the proof of Proposition 6 we have tr(p AP = tr(p (AP = tr((ap P = tr(a(p P = tr(a Proposition 29 For A M n (R, χ A (x = ( n x n + ( n tr(ax n + + det(a (Likewise for a transformation χ T (x = ( n x n + ( n tr(t x n + + det(t Proof First evaluating at x = 0 we find χ A (0 = det(a, which gives the constant term Writing A = (a ij we have det(a xi = a x a 2 a 3 a n a 2 a 22 x a 23 a 2n a 3 a 32 a 33 x a 3n a nn a n2 a n3 a nn x We use the explicit formula (Corollary to compute the leading two terms Observe that any permutation in S n except the identity fixes n 2 elements in {, 2,, n} Thus using the explicit formula we find det(a xi = n (a ii x + i= where the involves products containing n 2 of the diagonal entries a ii x Since the off-diagonal terms contain no x, the must be a polynomial of degree n 2 The result follows since ( n n n (a ii x = ( n (x a ii = ( (x n n a ii x n + lower order terms i= i= i= 2

In particular, the characteristic polynomial has degree n and so there are at most n eigenvalues (or in the case in which the base field is C, exactly n eigenvalues counting multiplicities Corollary 20 Let A M n (C have eigenvalues λ, λ 2,, λ n C (not necessarily distinct Then tr(a = λ + λ 2 + + λ n and det(a = λ λ n (and likewise for transformations T Proof Over C we have χ A (x = n i= (λ i x = ( n n i= (x λ i and n (x λ i = x n i= Now compare this with Proposition 29 n λ i x n + + ( n i= n i= λ i 22 Diagonalisation We now apply our theory to show that often given a linear map one can find a basis so that the matrix takes a particularly simple form Theorem 2 Let λ,, λ m (m n be the distinct eigenvalues of T and v,, v m corresponding eigenvectors Then v,, v m are linearly independent be Proof Suppose v,, v m are linearly dependent Renumbering the vectors if necessary we assume {v,, v k } is the smallest subset of linear dependent vectors in {v,, v m }, where k m So there exists a,, a k R with all a,, a k 0 such that Applying T λ k I to both sides we get That is a v + + a k v k = 0 (T λ k I(a v + + (T λ k I(a k v k = 0 a (λ λ k v + + a k (λ k λ k v k + a k (λ k λ k v k = 0 But λ i λ k 0 for i < k and λ k λ k = 0 So v,, v k are linearly dependent, contradicting the minimality of k Definition 22 A linear map T : V V is diagonalisable if V has a basis consisting of eigenvectors for T (For then the matrix for T with respect to this basis is a diagonal matrix A matrix A M n (R is called diagonalisable if the map it defines by acting on (column vectors in R n is diagonalisable Proposition 23 A matrix A M n (R is diagonalisable if and only if there exists an invertible matrix P such that B := P AP is a diagonal matrix (in which case, the diagonal entries in B are the eigenvalues, and the columns in P the corresponding eigenvectors 3

Proof Assume A is diagonalisable and let v,, v n be the basis of eigenvectors and λ,, λ n the eigenvalues (possibly with repetition of eigenvalues Using the notation in Section, define P = [v,, v n ] and B the diagonal matrix with entries λ,, λ n Then P is invertible since its columns are linearly independent, and the equation is the same as P B = AP, that is B = P AP [λ v,, λ n v n ] = [Av, Av n ] Conversely, given that B := P AP is diagonal, the columns of P must be n linearly eigenvectors of A and entries of B corresponding eigenvalues (since P B = AP Theorem 24 Let V be a vector space of dimension n Suppose a linear map T : V V (matrix A M n (R, respectively has n distinct eigenvalues Then T (A, respectively is diagonalisable Proof Assume T has n distinct eigenvalues For each of the n distinct eigenvalues λ i there is at least one eigenvector v i (by definition By Theorem 2 the n eigenvectors v,, v n are linearly independent, and thus form a basis for V (The statement for matrices A follows by viewing A as a map on R n The next corollary gives a sufficient (but by no means necessary condition for a map/matrix to be diagonalisable Corollary 25 Suppose χ T (x (χ A (x, respectively has n distinct roots in R Then T (A, respectively is diagonalisable over R Replacing the base field R by C in this corollary, and noting that the characteristic polynomial always has n roots over C counting multiplicity, one sees that when these roots in C are distinct the map (matrix, respectively is diagonalisable over C We now describe a general method for diagonalising a matrix (when it can be done Algorithm 26 Let A M n (R ( Compute χ A (x = det(a xi and find its roots λ R (real eigenvalues (2 For each eigenvalue λ, find a basis for Ker(A λi using, for example, row-reduction (this gives you linearly independent eigenvectors for each eigenvalue (3 Collect together all these eigenvectors If you have n of them put them as columns in a matrix P, and the corresponding eigenvalues as the diagonal entries in a matrix B Then B = P AP and you have diagonalised A If you have < n eigenvectors you cannot diagonalise A (over R 4

Note that the collection of eigenvectors found here must be linearly independent: this follows from an easy extension of the argument in the proof of Theorem 2 Example 27 Let V = R 2 (column vectors and T : V V be given by the matrix ( 0 2 A = 3 Then det(a xi 2 = (x (x 2 λ = : So Ker(A I 2 = ( 2, T A λi 2 = ( 2 2 ( 2 0 0 λ = 2: A λi 2 = So Ker(A 2I 2 = (, T Letting we find AP = P ( 2 2 P := ( 0 0 2 ( 2 ( 0 0 (, ie, P 0 AP = 0 2 Note that P is invertible here because the columns are eigenvectors for distinct eigenvalues and so are linearly independent 23 Geometric and algebraic multiplicity As before let T : V V be a linear transformation Definition 28 Let λ be an eigenvalue for T Then E λ := Ker(T λi = {v V : T v = λv} is called the eigenspace for λ (This is just the set of all eigenvectors of T with eigenvalue λ, along with the zero vector Note that E λ is a subspace of V since it is the kernel of the map T λi Definition 29 Let λ be an eigenvalue of T The dimension of E λ is called the geometric multiplicity of λ The multiplicity of λ as a root of the characteristic polynomial χ T (x is called the algebraic multiplicity of λ 5

Let s denote these multiplicities g λ and a λ respectively So χ T (x = (x λ a λ f(x where f(λ 0 Proposition 220 Let λ be an eigenvalue of T The geometric multiplicity of λ is less than or equal to the algebraic multiplicity of λ Proof Extend a basis for E λ to one for V Then the matrix for T with respect to this basis for V looks like ( λigλ 0 Hence the matrix for T xi looks like ( (λ xigλ 0 and so det(t xi = (λ x g λ h(x for some h(x := det( R[x] We must then have g λ a λ By this proposition one sees that in Algorithm 26 if at any stage during Step (2 one finds < a λ linearly independent eigenvectors for an eigenvalue λ then the matrix cannot be diagonalisable (for one cannot get a surplus of eigenvectors from the other eigenvalues The next proposition used to be mentioned in the course synopsis: it is really just a different way of saying something we have stated in a more intuitive way already (Worth thinking about, but no longer examinable Proposition 22 Let λ,, λ r (r n be the distinct eigenvalues of T Then the eigenspaces E λ,, E λr form a direct sum E λ E λr The point here is to show that each v E λ + + E λr can be written uniquely as v = v + + v r for some v i E λi (Or equivalently, if you prefer, that E λi E λj = {0} j i for each i r This is what it means for a finite collection of subspaces of V to form a direct sum But this is an immediate corollary of Theorem 2, since eigenvectors arising from distinct eigenvalues are linearly independent (check this yourself, or come to the lecture 3 Spectral theorem We prove the spectral theorem for real symmetric matrices and give an application to finding nice equations for quadrics in R 3 6

3 Spectral theorem for real symmetric matrices 3 The Gram-Schmidt procedure Recall from Linear Algebra I (205 Definition 76 or 206 Sections 7, 72 the notion of an inner product, on a finite dimensional real vector space V Recall that we say two vectors u, v V are orthogonal if u, v = 0, and we call a basis v,, v n V orthonormal if v i, v j = 0 ( i j n and v i := v i, v i = ( i n (that is, v i, v j = δ ij We consider now the case V = R n (column vectors with inner product the usual dot product (on column vectors By the Gram-Schmidt procedure (205 Linear Algebra I Remark 720 every finite dimensional real inner product vector space V has an orthonormal basis Let s looks at how this algorithm works for R n with the dot product (the discussion is completely analogous for a general inner product space Given {u,, u n } a basis for R n we construct an orthonormal basis {v,, v n } for R n with the property that Sp({u,, u k } = Sp({v,, v k } for k =, 2,, n as follows u u v := w 2 := u 2 (u 2 v v, v 2 := w n := u n n j= (u n v j v j, v n := w 2 w 2 w n w n One proves that {v,, v n } has the required properties (and in particular each w k 0 by induction on k for k n In detail, by induction on k we may assume v i v j = δ ij for i, j k and so for each i < k we have k w k v i = u k (u k v j v i = (u k v i (u k v i (v i v i = 0 j= Also w k 0 since otherwise u k Sp({v,, v k } = Sp({u i,, u k } which would be a contradiction So v k := w k / w k is indeed a unit vector orthogonal to v,, v k Next we see Sp({v,, v k, v k } = Sp({u,, u k, v k } = Sp({u,, u k, w k } = Sp({u,, u k, u k } The first equality here is by induction, and the last a direct application of the Steinitz Exchange Lemma from Linear Algebra (205 Lemma 39 or 206 Theorem 34 The algorithm is best explained by pictures in R 2 and R 3 (come to the lectures for this or draw these yourself Observe that given a vector v R n with v = we can extend this to an orthonormal basis 7

for R n using the Gram-Schmidt procedure That is, extend {v } to a basis arbitrarily and apply Gram-Schmidt 32 The spectral theorem Let A M n (R be a symmetric matrix, that is A T = A Now A may be thought of as a linear transformation on C n and so in particular has (counting multiplicities n eigenvalues in C (since it s characteristic polynomial χ A (t has counting multiplicity n roots in C In fact we have that: Proposition 3 The eigenvalues of A all lie in R Proof Let λ C be an eigenvalue of A with eigenvector v C n So Av = λv with v 0 Now (Av T v = v T A T v (Av T v Av=λv = λv T v Writing v T = (v,, v n we see A T =A = v T Av Av=Av=λv = λv T v v T v = v v + + v n v n = v 2 + + v n 2 > 0 since v 0 Thus we can cancel v T v and one gets λ = λ, that is λ R By a similar argument one can show that eigenvectors corresponding to distinct eigenvalues of a real symmetric matrix are orthogonal (Sheet 4 We ll prove instead though the following strong diagonalisability result for a real symmetric matrix A Proposition 32 Let A M n (R be symmetric Then the space R n has an orthonormal basis consisting of eigenvectors of A That is, there exists an orthogonal real matrix R (so R T = R such that R AR is diagonal with real entries Proof Let λ R be an eigenvalue (Proposition 3 Choose an eigenvector v R n for λ and normalise it so that v = Extend to a basis v, u 2,, u n in an arbitrary manner, and then apply the Gram-Schmidt procedure to obtain an orthonormal basis v, v 2,, v n Then writing P = [v,, v n ] define B := P AP Since the columns of P are orthonormal vectors we see that P T P = I n, that is, P = P T Hence B = P T AP is a symmetric matrix and so must have the form ( λ 0 B = 0 C for some C M n (R which is symmetric (The zeros down the first column come from v being an eigenvector, and along the first row from the symmetry of B The result now follows by induction on the dimension n 8

In detail, by induction there exists an orthonormal basis of eigenvectors for C; that is, an invertible matrix Q such that D := Q CQ is diagonal with real entries and Q = Q T Define ( 0 R := P 0 Q Then and is diagonal R = ( ( 0 0 Q P 0 = 0 Q T R AR = R T AR = ( λ 0 0 D P T = R T Equivalently we can state this proposition as the Theorem 33 (Spectral theorem for real symmetric matrices A real symmetric matrix A M n (R has real eigenvalues and there exists an orthonormal basis for R n consisting of eigenvectors for A Note that symmetric matrices are characterised by having the property (Au v = u Av (ie u T A T v = u T Av for all u, v R n To see why consider a matrix with this property and let u, v R n run over the standard basis this shows the matrix must be symmetric (and the reverse implication is immediate Now let V be a real vector space with inner product, We call a linear map T : V V self-adjoint (or symmetric if Then similar to above we have the T u, v = u, T v for all u, v R n Theorem 34 (Spectral theorem for self-adjoint operators on a real inner product space A self-adjoint map T on a finite dimensional real inner product space V has real eigenvalues and there exists an orthonormal basis for V consisting of eigenvectors of T Inner products on finite dimensional real vector spaces are just a basis free way of discussing the dot product So one can deduce this theorem immediately from Theorem 33 by just choosing an orthonormal basis for V to get a real symmetric matrix A for T (By Gram-Schmidt every finite dimensional real inner product space has an orthonormal basis It is easy to check that the matrix for a self-adjoint operator with respect to an orthonormal basis is symmetric Theorem 34 can also be proved in a basis free manner (see Part A Linear Algebra Example 35 9

( Of course not all real matrices have real eigenvalues, for example ( 0 0 (2 Let has eigenvalues ± A = 2 ( ( µ µ with µ 0 Then χ A (x = (x (µ + (x ( µ + and we find real eigenvalues µ + and µ + with corresponding eigenvectors, (3 Let 2 ( These are orthogonal, as predicted by the spectral theorem A = 0 2 0 0 2 Show without doing any numerical computations that the eigenvalues of A T A are real and non-negative Now A T A is a real symmetric matrix, so by the spectral theorem we can find an orthonormal basis of eigenvectors v, v 2, v 3 for A T A with corresponding real eigenvalues λ, λ 2, λ 3 R Note that since A T Av i = λ i v i we find that λ i (v i v i = (λ i v i v i = (A T Av i v i = (A T Av i T v i = v T i A T Av i = (Av i (Av i 0 Since v i v i > 0 we can cancel this to deduce λ i 0 (So indeed for any A M n (R the eigenvalues of A T A are real and non-negative Here in fact A T A = 2 2 2 2 2 2 8 has eigenvalues 0, 3, 9 (See Sheet 4 Question 5 for an application of all this 20

32 Quadrics We finish off with a geometric application of the spectral theorem for real symmetric matrices 32 Quadratic forms First an example Example 36 We apply our theory to simplify the quadratic form First rewrite this as Q(x, x 2 = (x x 2 ( 2 2 Q(x, y = x 2 + 4xy + y 2 ( x x 2 = x 2 + 2x x 2 + 2x x 2 + x 2 2 Writing S = ( 2 2 we diagonalise this using an orthogonal change of basis ( P 3 0 SP =, P := ( 0 2 = Rotation by π 4 So So defining Q(x, x 2 = (x x 2 P ( 3 0 0 P T ( x x 2 (y y 2 := (x x 2 P, ie, y = 2 (x + x 2, y 2 = 2 (x 2 x we have ( 3 0 Q(y, y 2 = (y y 2 0 ( y y 2 = 3y 2 y2 2 So by an orthogonal change of variable (in this case a rotation we have a simpler equation The above method works for general quadratic forms Definition 37 A quadratic form in n variables x,, x n over R is a homogeneous degree 2 polynomial x n Q(x,, x n = a ij x i x j = (x x n A, A = (a ij i,j= x n with real coefficients We can and do assume A is symmetric (by adjusting a ij and a ji to be equal 2

By the above method and Theorem 33 we can find an orthogonal change of variable (y y n = (x x n P, P T = P so that Q(y,, y n = λ y 2 + + λ n yn 2 where λ,, λ n R are the (all real eigenvalues of the symmetric matrix A 322 Classification of quadrics The natural generalisation to R 3 of a conic (solutions in R 2 of an equation of the form ax 2 + bxy + cy 2 + dx + ey + f = 0 with a, b, c not all zero is a quadric Definition 38 A quadric is the set of points in R 3 satisfying a degree 2 equation f(x, x 2, x 3 = 3 a ij x i x j + i,j= 3 b i x i + c = 0 with A = (a ij M 3 (R symmetric and non-zero, and b, b 2, b 3, c R i= The techniques in this section apply to both conics and quadrics, but we focus on the latter Using the method in Section 32 by an orthogonal change of basis we can simplify our quadric to have the form f(y, y 2, y 3 = λ y 2 + λ 2 y2 2 + λ 3 y3 2 + B i y i + c = 0 i where λ i R That is, apply this method to the leading form 3 i,j= a ijx i x j of the polynomial, and then make the required substitutions to the linear term Next assuming that A has rank 3 (so λ i 0 for i =, 2, 3 we can complete the square by substituting Y i = y i + B i 2λ i to get an equation f(y, Y 2, Y 3 = λ Y 2 + λ 2 Y2 2 + λ 3 Y3 2 + C = 0 (4 (So here we have excluded the cases when the rank of A is < 3 So when det(a 0 by an orthogonal change of variable followed by a translation one can put our quadric in this simple form Recall from Geometry that both orthogonal transformations and translations are isometric (distance preserving Indeed, any isometry in R n is an orthogonal map followed by a translation (Geometry, Theorem 83 and any orthogonal map in R 3 is either a rotation or a reflection, or a rotation followed by a reflection (from 3 Geometry, Theorem 77 (Indeed we can assume here 3 Let P be an orthogonal matrix If det(p = then by Theorem 77 the map P is a rotation So assume det(p, that is det(p = When Tr(P = then by Theorem 77 the map P is a reflection Otherwise, let Q be the (orthogonal diagonal matrix with entries,,, the reflection in the (y, z-plane Then det(qp = ( ( = and so by Theorem 77 the orthogonal matrix QP is a rotation Hence P = Q (QP is a rotation followed by reflection in the (y, z-plane 22

that our orthogonal map is a rotation by changing the sign of one of the eigenvectors of A (columns of P if necessary so that det(p =, rather than One now classifies these quadrics according to the signs of the eigenvalues, and whether or not the constant C equals 0 That is, up to an isometric change of variable any quadric with det(a 0 can be given by an equation of a particular form Noting also that one can scale the equation (4, one finds when det(a 0 that the cases are Ellipsoid : µ Y 2 + µ 2 Y2 2 + µ 3 Y3 2 = µ Y 2 + µ 2 Y2 2 + µ 3 Y3 2 = {0} µ Y 2 + µ 2 Y2 2 + µ 3 Y3 2 = 0 -sheet Hyperboloid : µ Y 2 + µ 2 Y2 2 µ 3 Y3 2 = 2-sheet Hyperboloid : µ Y 2 + µ 2 Y2 2 µ 3 Y3 2 = Cone µ Y 2 + µ 2 Y2 2 µ 3 Y3 2 = 0 Here after possibly reordering the indices, µ i := λ i C > 0 when C 0 and µi = λ i > 0 when C = 0 Look at the nice pictures of these quadrics at http://enwikipediaorg/wiki/quadric (or comes to the lectures One can similarly handle the cases det(a < 3, but it gets a little complicated (see below Example 39 We find the point(s on the quadric which is (are nearest to the origin 2(x 2 + x 2 2 + x 2 3 + x 2 x 3 + x 3 x + x x 2 = The associated symmetric matrix A = 2 2 2 = I 3 + B has eigenvalues,, 4 (one can just add to the eigenvalues 0, 0, 3 of B corresponding eigenvectors the vectors 0 u = 0, u 2 =, u 3 = We can take as By the sentence following Proposition 3 we are guaranteed that u 3 u = u 3 u = 0 However u u 2 0 Applying Gram-Schmidt to {u, u 2 } we find a better basis for the -eigenspace v = 2 0, v 2 = 2 3 2 2 Letting P := [v, v 2, 3 u 3 ] 23

we find P T = P and P AP = 0 0 0 0 0 0 4 Setting (y y 2 y 3 = (x x 2 x 3 P since this change of variable is distance preserving and fixes the origin, we must find the closest point to the origin of y 2 + y 2 2 + 4y 2 3 = This is an ellipsoid (sphere flattened along the y 3 -axis and the closest points are (0, 0, ± 2 Hence the closest points for our original ellipsoid are (x, x 2, x 3 = ( 0, 0, ± 2 P, ie, (x, x 2, x 3 = ( 0, 0, ± P T 2 that is ± 2 (,, Note that to find these points we did not really need to compute the orthogonal 3 matrix P, only its final column but we wished to illustrate how this can be done when A does not have distinct eigenvalues That s all you really need to know To fill any time at the end we could look at some cases when det(a = 0: ( Two eigenvalues are non-zero with same sign, say λ > 0, λ 2 > 0 and λ 3 = 0 Then we remove linear terms in y and y 2 If the coefficient of y 3 is non-zero then scaling by a positive constant and making a linear substitution Y 3 = ±y 3 + (const we get the form µ Y 2 + µ 2 Y 2 2 Y 3 = 0, where µ, µ 2, µ 3 > 0 This is an elliptic paraboloid If the coefficient of y 3 is zero then we cannot do anything much about the constant term except scale it by a positive constant, so get µ Y 2 + µ 2 Y 2 2 =, or µ Y 2 + µ 2 Y 2 2 =, or µ Y 2 + µ 2 Y 2 2 = 0 with µ, µ 2, µ 3 > 0 These are the elliptic cylinder, and {0} (2 Two eigenvalues are non-zero but with different sign, say λ > 0, λ 2 < 0 and λ 3 = 0 According to the coefficient of y 3 we similarly get the cases Hyperbolic paraboloid µ Y 2 µ 2 Y2 2 Y 3 = 0 Hyperbolic cylinder µ Y 2 µ 2 Y2 2 = Unnamed µ Y = ± µ 2 Y 2 with µ, µ 2 > 0 Note that switching Y and Y 2 allows one to treat the ± constant terms together 24

Linear Algebra II. Ulrike Tillmann. January 4, 2018