Linear Algebra MAT 331. Wai Yan Pong

Linear Algebra MAT 33 Wai Yan Pong August 7, 8

Contents Linear Equations 4. Systems of Linear Equations..................... 4. Gauss-Jordan Elimination...................... 6 Linear Independence. Vector Spaces.............................. Linear Dependence.......................... 4.3 Determinant.............................. 8.4 Direct Sums and Quotients..................... 3 Linear Maps 5 3. Linear Maps.............................. 5 3. Matrix Representation of Linear Maps............... 9 3.3 Equivalence.............................. 33 4 Endomorphisms 4 4. Similarity............................... 4 4. Diagonalization............................ 4 4.3 Minimal Polynomials......................... 46 4.4 Jordan Normal Form......................... 49 5 Forms 5 5. Linear Forms............................. 5 5. Bilinear and Quadratic Forms.................... 5 6 Inner Products 58 6. Inner products............................ 58 6. Orthonormal bases.......................... 59 6.3 Projections.............................. 6 7 The Spectral Theorem 6 7. The Spectral Theorem........................ 6

CONTENTS 3 8 Singular Value Decomposition 63 8. Singular value decomposition.................... 63 8. Pseudoinverse of a matrix...................... 64 A Groups, Rings and Fields 66 A. Groups................................. 66 B Axioms of Vector Space 68 C Matrix 69 C. Inverse................................. 69

Chapter Linear Equations. Systems of Linear Equations We begin our investigation by looking at the simplest kind of linear equation: ax = b. (.) Whether the equation in (.) has a solution depends what kind of solutions are allowed. For example, 6x = has no solutions if we consider only the integers. However, /3 is the solution if the rationals are allowed and if we are talking about integers modulo, then there are tow solutions namely, 7 (mod ). Hence whether (.) is solvable is more about the number system in consideration than the nature of the equation. To simplify the discussion, we assume (.) has a solution whenever a. More precisely, we assume our number system is a field (see Appendix A). For now, it is enough to keep in mind that rational numbers Q, real numbers R and complex numbers C are fields with their usual addition and multiplication. Also keep in mind that there are fields with only finitely many elements, like Z/pZ where p is a prime number. We simply use K to denote a field. The key point of having a field for us is that: Every non-zero element of a field has a multiplicative inverse. One can also shows that if b and c are multiplicative inverses of a then b = c. We denote that unique multiplicative inverse of a by a or /a. As usual, we can divide by a by multiplying its inverse. In general, a system of m linear equations in n unknowns looks like a x + a x + + a n x n = b a x + a x + + a n x n = b. (.) a m x + a m x + + a mn x n = b m 4

.. SYSTEMS OF LINEAR EQUATIONS 5 We call a ij s ( i n, j m) the coefficients and b i s the constants of (.). System (.) can be expressed as a matrix equation: a a a n x b a a a n x..... = b (.3). a m a m a mn x n b m or more succinctly as Ax = b. We call the n m matrix A = (a ij ) the coefficient matrix of System (.). We call the column vector b = (b i ) the constants of (.). A system of linear equations is consistent (or solvable) if it has a solution. A system is homogeneous if all its constants are. A homogeneous system must be consistent since x = = x n = is clearly a solution. Before getting into an algorithm of solving linear systems, observe that if v, v are two solutions of Ax = b, then Av Av = b b A(v v ) =. That means h := v v is a solution of Ax =. (.4) This shows that the difference of any two solutions of (.) is a solution of (.4) the homogeneous system associated to System (.). On the other hand, if v is one particular solution of (.) and h is a solution of (.4) then A(v + h) = Av + Ah = b + = b and so v + h is a solution of (.). Putting these together, we proved Theorem... The set of solutions of Ax = b is v + H = {v + h: Ah = } where v is any one solution of Ax = b and H is the set of solutions Ax =. Example... Consider the system of linear equations ( ) x ( ) x = That is x 3 x 4 x + x x 4 = x 3 x 4 = i.e. an element of K n that satisfies the equation. (.5)

6 CHAPTER. LINEAR EQUATIONS In this form, the variables x and x 3 (correspond to the first non-zero entry of the rows) are readily expressed in terms of the other variables, i.e. x and x 4. x = x + x 4 x 3 = + x 4 From this, the solution sets of (.5) can be readily expressed in the form v +H: x + x 4 x + x 4 : x, x 4 K x 4 x + x 4 = + x x 4 : x, x 4 K x 4 = + x + x 4 : x, x 4 K as described in Theorem.... Gauss-Jordan Elimination In this section, we describe the most fundamental algorithm of linear algebra the Gauss-Jordan elimination. A row (or a column) of a matrix is non-zero if it has at least one non-zero entry. The pivot of a non-zero row is the first non-zero entry of the row. The augmented matrix of a linear system is the matrix obtained by appending its constant vector to its coefficient matrix as the last column : a a a n b a a a n b...... a m a m a mn b m For example, the augmented matrix of (.5) is ( ). This matrix has the special form: Definition... A matrix is in reduced row echelon form (or rref in short) if The vertical bar is not part of the matrix but just a typographical trick to remind us that the last column holds the constants.

.. GAUSS-JORDAN ELIMINATION 7. All nonzero rows are above any zero row;. The pivot of a nonzero row is always strictly to the right of the pivot of the row above it; and 3. Every pivot is and is the only nonzero entry in its column. As illustrated in Example.., the solutions set of a system can be easily described if its augmented matrix is in rref. In this sense a system of linear equations is solved if its augmented matrix is in rref. The Gauss-Jordan elimination can be viewed as an algorithm that transforms a given matrix into an equivalent reduced row echolen (rre) matrix, here equivalent means their corresponding linear systems have the same solution sets. Each step in the Gauss-Jordan elimination is an elementary row operations (or ERO in short) which is an operation of one of the following forms: RS R k R l. Swapping the k-th and the l-th row. RM cr k R k. Multiply the k-row by a non-zero scalar c. RR cr l + R k R k (k l). Add c times the l-th row to the k-th row. An elementary matrix is a matrix obtained by applying an ERO to an identity matrix. Suppose ρ is an ERO, denote by E ρ (m) the elementary matrix obtained by applying ρ to I m. We often simplify the notation to just E ρ. For instance, when we write E ρ A, we understood that E ρ = E ρ (m) where m is the number of rows of A. The follow are few key facts about elementary matrices. Proposition.... Every elementary matrix is invertible.. The inverse of an elementary matrix is also an elementary matrix (Exercise.). 3. E ρ A is the matrix obtained by applying the elementary row operation ρ to A. Proof. The first two item correspond to the facts that every ERO is reversible and that the reverse operation is also an ERO. The last item can be verified by direct computation. An important consequence of these observations is: Theorem... EROs preserve solutions of linear systems. Proof. It is plain that every solution of Ax = b is a solution of E ρ Ax = E ρ b. The latter system is, by the second observation, obtained by applying ρ to the first one. By multiplying Eρ, whose existence is asserted by the first fact, we see that the converse holds as well.

8 CHAPTER. LINEAR EQUATIONS Finally, we are in the position to state the Gauss-Jordan Elimination. Round I: Elimination. If the matrix is a zero matrix (i.e. every entry is ), stop and proceed to the next round.. Pick a row with a pivot in the first non-zero column and make it the st row by an RS. 3. Make the pivot of the first row by an RM. 4. Make every entry underneath the pivot of the st row by applying RR as many times as needed. 5. If the matrix has only one row, stop and proceed to the next round. Otherwise go through the steps again on the matrix obtained by deleting the first row. Round II: Backward Substitution. Take the output of Round. If the matrix is zero, stop, otherwise make all the entries above the pivot of the last non-zero row by applying RR as many times as needed.. Repeat the steps to the submatrix obtained by deleting the last row until the submatrix contains only a single row. Example... Consider the system of linear equations x + x 3 x 4 = 4x + x x 4 = x x + 6x 3 + x 4 = 3 (.6) So the coefficient matrix and the constant vector of the system are 4 6 and respectively. And the augmented matrix of the system is: A = 4 6 3 Now we apply Gauss-Jordan elimination to A. The first pass of Round I, performs the following EROs. 3

.. GAUSS-JORDAN ELIMINATION 9 # R --> R*(-) A = A.with_rescaled_row(,-) # R --> R*(4) + R A = A.with_added_multiple_of_row(,,4) # R3 --> R*(-) + R3 A3 = A.with_added_multiple_of_row(,,-) A3 = 8 5 5 The submatrix obtained by deleting the first row is A4 = A3.submatrix(,) A4 = ( 8 5 5 Applying the steps in Part I to the submatrix yields a submatrix A5 = A4.with_added_multiple_of_row(,,) A5 = ( 8 5 8 Again, the submatrix obtained by deleting the first row is non-zero: A6 = A5.submatrix(,) A6 = ( 8 ) Pass this through the steps in Part I again, we get # R --> R *(-/8) A7 = A6.with_rescaled_row(,-/8) A7 = ( 8 5 4 Thus the output of passing A to Round I is A8 = 8 5 8 5 4 Now use A8 as the input for Round II. A9 = A8.with_added_multiple_of_row(,,8) A = A9.with_added_multiple_of_row(,,) ) ) )

CHAPTER. LINEAR EQUATIONS 7 4 3 A = 53 8 5 4 The matrix is now in reduce row echelon form and the next two passes of Round II yields the same matrix. And so the algorithm stops and this is the output for the Gauss-Jordan elimination on the original matrix A. The observant reader will note that the steps in Gauss-Jordan elimination are not unique. However, Theorem..3. The rref of a matrix is unique. The simplest proof that I know of was given by Thmoas Yuster [3]: Proof. Let A be an m n matrice. Suppose B and C are two rre matrices that are row equivalent to A. We will show that B = C by induction. The case n = is trivial (B = (,,..., ) T = C). Suppose n >. For a matrix M, let M be the matrix obtained from M by deleting the n-th (i.e. the last) column. We observed that B and C are still in rref and both are row equivalence to A : the same sequence of EROs that brings A to B (resp. C) brings A to B (resp. C ). Thus, by induction B = C and so B and C can only differ at n-th column. Assume B C, then b jn c jn for some j m. Let u be an column vector such that Au =. Then Bu = Cu (since B, C are both row equivalent to A) and so (B C)u =. But only the last column of B C is nonzero thus the j-th coordinate of (B C)u is (b jn c jn )u n. Since b jn c jn, we must have u n =. From this we conclude that the last columns of B and C are both pivotal. Otherwise, the last variable to the system Ax = is a free variable and hence Au = for some u with u n, contradiction. Since the first n columns of B and C are identitical, the row in which this the leading appears must be the same for both B and C, namely the first zero row of the rref of A. Because the remaining entries in the n-th columns of B and C must all be zero, B = C which is a contradiction. This completes the proof. Theorem..4. A homogeneous linear system with more unknowns than equations has a non-trivial solution. Proof. Suppose a homomogenous system Ax = has more unknowns than equations. Then A has more columns then rows and so its rref will contain a column that is non-pivotal, i.e. the system will have at least one free variable and hence a non-trival solution (e.g. by setting that free variable to ). Exercise Exercise.. Elementary Matrices Exercise.. Find the multiplicative inverse of the complex number 3 + i as follows: suppose (3 + i)(x + iy) =

.. GAUSS-JORDAN ELIMINATION Then by expanding the left-hand side of the above equation and equating the real part and the imaginary part of both sides one get two linear equations in x, y. Find x, y by solving that system. More generally, if a + bi, find the multiplicative inverse of a + bi using this method. Exercise.3. Show that if two rre matrices of the same shape are row equivalent then they are equal.

Chapter Linear Independence. Vector Spaces Let K be a field. A vector space over K (or a K-vector space) is a set V together with two operations:. +: V V V (addition). : K V V (scalar multiplication) which satisfy certain rules (see Appendix B). We call elements of V vectors and elements of K scalars. Unless otherwise stated, V always denote a K- vector space. The following are some common examples of vector spaces. More examples are given in Exercise.. Example... The set K n consisting of n-tuples of elements of K equipped with coordinate-wise addition and scalar multiplication: (u,..., u n ) + (v,..., v n ) = (u + v,..., u n + v n ) λ(u,... u n ) = (λu,..., λu n ) Many familiar vector spaces, like R, R 3, as well as n the space of n-bits are this form. Here K = = {, } is the field of two elements. Example... The set M m n (K) of n m-matrices over K (m, n N) with matrix addition: (A+B) ij = A ij +B ij and scalar multiplication: (λa) ij = λa ij. The zero n m matrix serves as the zero vector for this space. Example..3. The set of polynomials over K equipped with the usual addition of polynomials and usual multiplication of polynomial by constants form a vector space over K. We use P (t) (or K[t]) to denote this vector space. Example..4. The set of functions from a set X to K forms a vector space over K with the usual addition functions: (f + g)(x) = f(x) + g(x) and the usual multiplication by constants: (λf)(x) = λ f(x) form a vector space over K. We use K X (or F (X)) to denote this vector space.

.. VECTOR SPACES 3 Definition... A subspace of a vector space V is a non-empty subset of V that is closed under the restrictions of the operations of V. In other words, W is a subspace of V, denoted by W V, if. W V.. For any w, w W, w + w W. 3. For any a K and w W, aw W. Example..5. Fix an n. The polynomials over K of degree at most n (with the usual operations on polynomials) form a subspace of K[t]. We denote this space by K n [t] (or P n (t). It can be identified with K n+, via the map n k= a kt k (a, a,..., a n ). Example..6. Let A M m n (K). The solution set of the homogeneous system Ax =, i.e. {v K n : Av = }, form a subspace of K n. We call it the nullspace (or the kernel) of A. Example..7. Given a vector v K n, the set of matrices that maps it to K m, i.e. {A M m n (K): Av = } is a subspace of M m n (K). Example..8. The diagonal of a square matrix A consists of entries with the same row and column index, i.e. entries of the form a ii. We define tr(a) the trace of A as the sum its diagonal entries. It is easy to check the set is a subspace of M n (K). {A M n (K): tr(a) = } Definition... Let X be a subset of V. A linear combination of vectors in X is an expression of the form a x x (.) x X where a x K (x X) and only finitely many of them are non-zero. The scalar a x is the coefficient of x in the linear combination (.). A linear combination is trivial if all its coefficients are zero. We often write a linear combination of vectors in X as a x + +a m x m where the vector x i X (m ) are distinct. A vector in V is expressible as a linear combination of vectors in X, if it is the sum of a linear combination of vectors in X. A vector may be expressed by more than one linear combinations. The empty linear combination (i.e. when X = ) has no coefficients and is expressing the zero vector of V.

4 CHAPTER. LINEAR INDEPENDENCE Definition..3. The span of X, denoted by X (or span(x)), is the set of vectors in V expressible by some linear combination of vectors in X. It is easy to verify that X is a subspace of V and is the smallest subspace of V containing X, i.e. that every subspace of V that contains X also contains X. Example..9. Let A M m n (K). The span of the rows (resp. the columns) of A is called the row space (resp. the column space) of A. The row space of A is a subspace of K n while the column space of A is a subspace of K m. Proposition... The row space of EA (whenever EA is defined) is a subspace of the row space of A. Proof. The rows of EA are linear combinations of the rows of A. Consequently, if E is invertible than the row space of A and EA are the same. In particular, since elementary matrices are invertible, Proposition... Row equivalence matrices have the same row space. As a result, A and its rref R A have the same row space. This is not true for column space as the following example shows. Example... The matrix A = and its rref R = clearly have different column spaces.. Linear Dependence The key concept of linear algebra is linear dependence. Definition... A subset X of a vector space is linearly dependent if the zero vector can be expressed as a non-trivial linear combination of vectors in X. In other words, X is linearly dependent if and only if there exist a,..., a m K (m ) not all zero and x,..., x m X such that a x +, a m x m =. We say that X is linearly independent if X is not linearly dependent. Note that X is linearly dependent if and only if some finite subset of X is linearly dependent. Proposition... X is linearly dependent if and only if some vector in V can be expressed as two different linear combinations of vectors in X. Proof. Suppose x X a xx and x X b xx are two distinct linear combinations of vectors in X. Then there are still only finitely many x X for which a x b x are non-zero. Since the linear combinations are different so a x b x for at least one x X, thus x b x )x = x X(a a x x b x x x X x X

.. LINEAR DEPENDENCE 5 is a non-trivial linear combination of vectors in X expressing showing that X is linearly dependent. Conversely suppose X is linearly dependent, then can be expressed as a non-trivial linear combination of vectors in X. So that linear combination and the trivial linear combination of vectors in X are distinct and both express. We leave the proof of the following proposition as an exercise. Proposition... X is linearly dependent if and only if some x X can be expressed as a linear combination of vectors in X \ {x}. Example... The empty set is linearly independent. Example... Any set that contains the zero vector is linearly dependent since =. Example..3. The subset X = {(, 5, 3), (, 9, ), (3, 3, )} of R 3 is linearly independent. To see this, note that a(, 5, 3) + b(, 9, ) + c(3, 3, ) = (,, ) exactly means that (a, b, c) is a solution of the homogeneous system 3 x 5 9 3 y =. 3 z So X is linearly independent if and only if the system above has only trivial solution and this condition can be determined by the rref of the coefficient matrix A. In our case, the rref of A is Thus our original system indeed only has trivial solution, therefore X is linearly independent. More generally, Example..4. A finite subset X = {v,..., v m } of K n is linearly independent if and only if the homogeneous system Ax = has only trivial solution where A is the n m-matrix with v i as its i-th column ( i m). This is equivalent to rref(a) having the form I m followed by some, if any, zero rows. Consequently if m > n, then X must be linearly dependent because then the homogeneous system Ax = will have more variables than equations and hence must have a non-trivial solution (the fundamental theorem of linear algebra). Definition... A subset B of a vector space V is a spanning set if B = V. An independent spanning set is called a basis. An ordered basis is a basis together with a well order on it.

6 CHAPTER. LINEAR INDEPENDENCE Example..5. Let e i be the vector (,...,,..., ) K n where the appears at the i-th component. The ordered set {e,..., e n } is called the standard basis of K n. Example..6. Let E ij M n m be the matrix with e ij = be the only non-zero entry. One checks readily that the set is a basis of M n m (K). {E ij : i n, j m} Example..7. The powers of t,, t, t,..., form a basis of the vector space of polynomials K[t]. The first n + of them, i.e., t..., t n form a basis of P n. Every vector space has a basis. To establish this fact in full generality requires some assumptions on infinite sets. We will prove this under the assumption that the vector space V has a finite spanning set and indicate how that assumption can be removed in the exercises. First we need a lemma: Lemma..3. Let U be an linearly independent subset of V and w V. If w / U, then U {w} is linearly independent as well. Proof. Suppose U {w} is linearly dependent then some non-trivial linear combination u U a uu + a w w =. Hence a w w U and by dividing a w, w U as well. The last step can be justified because if a w were then the linear combination involves only vectors in U but that contradicts the linear independence of U. Theorem..4. Suppose W a finite spanning set of V and U is a linearly independent subset of W. Then there exists a basis B of V such that U B W. Proof. If W U then V = W U = U. Thus U itself is a spanning set and hence a basis of V. Otherwise, some w W is not in U and by Lemma..3, U {w} W is linearly independent. The same argument shows that either U {w} is a basis of V or else can be extended by an element of W to a linearly independent subset of W. Since W is finite, this process must stop eventually resulting a basis B of V such that U B W. Since the empty set is linearly independent, we conclude that if V has a finite spanning set then V has a finite basis. Theorem..5. Suppose U is a linearly independent subset of V and W is a finite spanning set of V then U W. Proof. Suppose W = {w,..., w n } and there are u,..., u n+ distinct vectors in U. Since W is a spanning set, for each j n +, u j = n i= a ijw i. Since the matrix A = (a ij ) has n rows and n + columns, by the fundamental namely Zorn s lemma or equivalently the axiom of choice.

.. LINEAR DEPENDENCE 7 theorem of linear algebra (Theorem..4) Av = for some non-zero vector v. Thus, n+ n+ n n n+ n v j u j = v j a ij w i = a ij v j w i = w i =. j= j= i= But this contradicts the assumption that U is linearly independent. i= It follows from Theorem..5 that if V has a finite spanning set then every basis of V is finite. Moreover any two bases of V have the same size. This justifies the following definition. Definition..3. The dimension of V, denoted by dim(v ), is the common cardinality of its bases. Here are the dimensions of some vector spaces that we have encountered: The dimension of K n is n since the standard basis has size n. The dimension of M m n (K) is mn. The dimension of K[x] is the cardinality of the set of natural numbers. The dimension of P n is n +. The row rank (resp. column rank) of a matrix is defined to be the dimension of its row space (resp. column space). Proposition..6. The row rank and the column rank of a rre matrix are the same. Proof. Deleting the zero rows at the bottom, if any, from a rre matrix clearly affects neither its row rank nor its column rank. So we can assume all rows are nonzero. Then the row rank is simply the number of rows, say r, because the pivots of the rows appear in different columns and they are the only nonzero entry in the corresponding columns. The column rank is also r because the pivot columns form the standard basis of K r. As we have seen in Example.., elementary row operations can change the column space of a matrix but they cannot change their column ranks. Lemma..7. The column rank of EA where E is an elementary matrix is at most the column rank of A. Proof. By Theorem..4, some columns of EA form a basis of its column space. Let B be the submatrix of EA consisting of those columns and A be the submatrix of A consisting of the corresponding columns. Since EA is obtained from A by performing a row operation, therefore B = EA. If the columns of A are linearly dependent, then there exists some non-zero vector v such that A v = and so Bv = EA v = E = contradicting the fact that columns of B are linear independent. This shows that the column rank of A is at least the column rank of EA. j= i=

8 CHAPTER. LINEAR INDEPENDENCE Since the inverse of an elementary matrix is elementary, therefore Proposition..8. The column rank of two row equivalent matrices are the same. Finally, we conclude that Theorem..9. The row rank and the column rank of a matrix are the same. Proof. Since A and R A its rref are row equivalent, by Proposition.., their row ranks are the same. By Proposition..8, their column ranks are the same. Finally, since R A is a rre matrix, by Proposition..6, the row rank of the column rank of R A are the same. Therefore, the row rank and the column rank of A are the same..3 Determinant There is a more geometric way of thinking about linearly dependence in R, namely two vectors in R are linearly independent if and only if they are the adjacent sides of a parallelogram with positive area. Likewise, three vectors in R 3 are linearly independent if and only if the parallelepiped that they span has positive volume. Is there a natural generalization of this criteria of linear independence to an arbitrary finite dimensional vector space? The first thing to realize is that we have to be flexible on the idea of volume since an element of an arbitrary field needs not be a number and whether an element of a field is positive needs not make sense in general. However, we can still require our volume to be a function from the set of n-tuples of vectors from K n, or equivalently from M n (K) to K such that its value on A is non-zero if and only if the rows of A are linearly independent. So what other properties should this function possess? In the case of R, if we relax the requirement so that area can take any real number as value then it is not hard to see that for any u, v, w R and λ R,. area(λu, v) = λ area(u, v) = area(u, λv);. area(u + v, w) = area(u, w) + area(v, w); area(w, u + v) = area(w, u) + area(w, v); and 3. area(u, u) =. The last property is clear: a collapsed parallelogram should have no area. The first property is also clear if we allow area to take arbitrary real number as value. The second property also follows from this relaxation and the observation that a parallelogram and a rectangle with the same base and height have the same area (so we can assume that u, v are perpendicular to w). In general, a one can also identify an n-tuple of vectors in K n with the columns of an n n-matrix over K

.3. DETERMINANT 9 function from a finite Cartesian product of vector spaces over K to K respect the vector space operations at each of is argument is called a multilinear form. We will discuss multilinear forms again in Chapter 5. We say that a multilinear form f : V n K is alternating if it vanishes whenever two of its arguments are the same. This implies the value of the form is multiplied by whenever two of its arguments are swapped, i.e. f(, u,, v, ) = f(, v,, u, ). Moreover, these two conditions are equivalent if in K (Exercise). Finally it is reasonable to require our generalized volume to take the value on the standard basis of K n (i.e. the identity matrix I n ). Interesting, there is only one alternating multilinear form that fits the bill. More precisely, Theorem.3.. There is a unique alternating multilinear form on M n (K), regarding the rows of a matrix as the arguments of the form, which takes value on the identity matrix I n. Proof. By multilinearity, any such form is determined by its values on matrices whose rows are the e i ( i n). Being an alternating form, its value at such a matrix will be if the matrix has some rows repeated, or else the matrix is simply obtained from I n by permuting its rows, say s times 3, and so the value of the form at that matrix is ( ) s. We denote this form 4 by det and call its value at A the determinant of A. It is clear that everything that has been said about determinant works just as well with the word row replaced by column so we conclude that Proposition.3.. det(a) = det(a T ) for every square matrix A. Example.3.. It is instructive to compute the determinant of a matrix. ( ) a b det = det(ae c d + be, ce + de ) = a det(e, ce + de ) + b det(e, ce + de ) = ac det(e, e ) + ad det(e, e ) + bc det(e, e ) + bd det(e, e ) = (ad bc) det(e, e ) = ad bc. With some book-keeping, it is not hard to generalize the previous computation to an arbitrary n n matrix A and obtained the Leibniz formula for 3 The number s is certainly not uniquely determined by such matrix but its parity is. 4 strictly speaking, one determinant form for each n.

CHAPTER. LINEAR INDEPENDENCE determinant: det(a) = det j a j e j,..., j a nj e j = σ a σ()... a nσ(n) det ( e σ(),..., e σ(n) ) = σ sgn(σ)a σ() a nσ(n) where σ runs through the permutations of {,..., n} and sgn(σ) is either or depending on whether σ is an even or odd permutation 5. Proposition.3.3. Let E, M M n (K) and E is elementary, then. det(e).. det(em) = det(e) det(m). We give a proof for the case when E = E ρ where ρ is the operation that adds λ times of the i-th row to the j-row (wlog i < j).... m i m i m i det(e ρ M) = det. = λ det. + det. = det M. λm i + m j m i m j. By taking M = I n, we get det(e ρ ) = and so det(e ρ M) = det(m) = det(e ρ ) det(m). We leave the proofs for the other two cases as an exercise to the reader. Proposition.3.4. For any A, B M n (K), det(ab) = det(a) det(b). Proof. Let R A be the rref of A, we have A = E E k R A for some elementary matrices E i ( i k). By Proposition.3.3 (), det(a) = det(e ) det(e n ) det(r A ). Since det(e i ) (Proposition.3.3 ()). Thus det(a) = if and only if det(r A ) =. Since R A is a row reduced echelon square matrix, if the rows of R A are non-zero then R A = I n which has determinant. Thus det(r A ) = implies its last row must be zero. But then the last row of R A B is zero as well and so det(r A B) =. Thus if det(a) = then we conclude from Proposition.3.3 () that det(a) det(b) = = det(e ) det(e k ) det(r A B) = det(ab) 5 Every permutation is a product of transpositions. It is a fact that the parity of the number of transpositions needed depends only on the permutation...

.4. DIRECT SUMS AND QUOTIENTS Now if det(a), then by the same argument R A must be I n and so A = E E k. Therefore, it follows from Proposition.3.3 () again that det(ab) = det(e E k B) = det(e ) det(e k ) det(b) = det(e E k ) det(b) = det(a) det(b). Theorem.3.5. For any A M n (K), The following conditions are equivalent:. det(a). The rows of A are linearly independent. 3. The columns of A are linearly independent. 4. Ax = has no non-trivial solution. Proof. From the proof of Proposition.3.4, we know that det(a) if and only if the n rows of R A span K n. Since the rows of R A are linear combination of the rows of A and vice versa, the last condition is the same as the n rows of A span K n which is equivalent to the linearly independence of the rows of A. This shows the equivalence of () and (). Since det(a) = det(a T ), Condition () is also equivalent to det(a T ) which is equivalent to the rows of A T, i.e. the columns of A, are linearly independent and this last condition is equivalent to the system Ax = has no nontrivial solution..4 Direct Sums and Quotients Definition.4.. Given vector spaces V and W over the same field K. The direct sum of V and W, denoted by V W, is the vector space on the Cartesian product V W equipped with the operations: (v, w) + (v, w ) = (v + v, w + w ) α(v, w) = (αv, αw) We call the two maps ι : v (v, W ) and ι : w ( V, w) the canonical embedding of V and W, respectively. Clearly ι (V ) is a copy V and ι (W ) is a copy of W sitting inside V W. Note that the intersection of these two subspaces is the zero space. It is clear that if {v,..., v n } is a basis of V and {w,..., w m } is a basis of W, then {ι (v ),..., ι (v n ), ι (w ),... ι (w m )} is a basis of V W. In particular, Proposition.4.. For finite dimensional spaces V and W, dim(v W ) = dim V + dim W.

CHAPTER. LINEAR INDEPENDENCE Example.4.. K n K m is isomorphic to K n+m. Let U be a subspace of V. We say that v, v V are U-equivalent if their difference v v is in U. One checks easily this is indeed an equivalent relation and that the equivalent classes are of the form v + U = {v + u: u U}. Definition.4.. Let U be a subspace of V. The quotient space of V by U is the set V/U = {v + U : v V } of U-equivalent classes equipped with the operations: (v + U) + (v + U) = (v + v ) + U. α(v + U) = (αv) + U Note that these operations on the equivalent classes are defined through the operations on the representatives and so one needs to check the independence of such choices. We refer the map π U : V V/U, v π U v + U. as the canonical surjection (associated to U). Clearly, π U is a surjection and preserves vector space operations, i.e. We will show that π U (v + v ) = π U (v) + π U (v ) π U (αv) = απ U (v) Proposition.4.. Suppose U is a subspace of a finite dimensional space V. Then dim V/U = dim V dim U. Example.4.. In general, it is hard to see a quotient. However, in some cases, e.g. V = R 3 and U = xy-plane, one can visualize the quotient V/U as a subspace of V consisting of one representative for each class v + U. However, the choice of such subspace is still far from unique. In the case mentioned above, one can identify the quotient V/U with the z-axis. But in fact, any line l passing through the origin and is not contained in the xy-plane will do. Exercises Exercise.. Show that each of the following sets forms a K-vector space with the given operations.. The set K[x] of polynomials over K with polynomial addition and scalar multiplication.

.4. DIRECT SUMS AND QUOTIENTS 3. The set K N of sequences over K with addition and scalar multiplication defined by (x k ) + (y k ) = (x k + y k ) and; a(x k ) = (ax k ) 3. The set K K of functions from K to K with function addition and scalar multiplication, i.e. (f + g)(x) = f(x) + g(x); (af)(x) = af(x). 4. Field extension: Let L/K be a field extension, i.e. K is a subfield of L. Then L has a natural K-vector space structure, namely the addition and multiplication in L. For example, L = Q( ) and K = Q. Then for a, b, r, s Q, (a + b ) + (r + s ) = (a + r) + (b + s) a(r + s ) = ar + as Exercise.. Let α be a element in some extension of K. Show that the set of polynomials over K vanish at α: is a subspace of K[x]. {p(x) K[x]: p(α) = } Exercise.3. Let c be the set of convergence real sequences. Show that c form a subspace of the space of real sequences. Exercise.4. Show that the set of differentiable functions from R to R form a subspace of the space of all real functions of one variable. Exercise.5. Show that the intersection of two subspaces of V is a subspace of V. More generally, show that the intersection of any family of subspaces of V is a subspace of V. Deduce that X is the intersection of the subspaces of V that contains X. Exercise.6. Give an example that the union of two subspaces is not a subspace. Exercise.7. Show that a subset X of a vector space is a subspace if and only if X = X. Deduced that for any subset X of a vector space, X = X. Exercise.8. Show that a singleton {v} is linearly independent unless v is. Exercise.9. Show that if the char K, then the subset X = {(, ), (, ), (, )} of K is linearly dependent. What if char K =? Exercise.. Prove Proposition...

4 CHAPTER. LINEAR INDEPENDENCE Exercise.. A set C of subsets of a given set is a chain if C is totally ordered by set inclusion. The union of a chain is defined to be the union of its elements. Show that. the union of a chain of subspaces is a subspace.. the union of a chain of linearly independent subsets is a linearly independent subset. Exercise.. Show that a maximal linearly independent subset of a vector space is also a spanning set (hence a basis). Here a linearly independent subset is maximal if it not properly contained in any linearly independent subset of the same vector space. Exercise.3. Use Zorn s lemma, Exercise., Exercise. and Lemma..3 then the assuming of existence of finite spanning set can be removed. Exercise.4. Check that U-equivalence is an equivalence relation. Exercise.5. Let U, W be subspaces of a vector space V. We define their sum to be U + W = {u + w : u U, w W }. Show that U + W is a subspace of V.. Let X, Y be subsets of V show that X + Y = X Y. Exercise.6. Suppose dim V = n and U V. Let D = {u,..., u m } be a basis of U. And let B = D {v,..., v n m } be a basis of V extending D. Show that C = {v j + U : j n m} is a basis of the quotient space V/U. Hence dim V/U = n m = dim V dim U.

Chapter 3 Linear Maps 3. Linear Maps An interesting map between two vector spaces should respect the vector space structures of its domain and codomain otherwise it will be just a map between sets. Definition 3... A map T : V W between two K-vector spaces is K-linear (or simply linear) if For any v, v V, T (v + v ) = T (v ) + T (v ). For any α K and v V, T (αv) = αt (v). So linear map (a.k.a linear transform or homomorphism) is a map between vector spaces that preserves the vector space operations between its domain and codomain. By a monomorphism we mean an injective linear map and a epimorphism we mean a surjective linear map. We say that a linear map T : V W is an isomorphism if it is invertible, i.e. there exist a linear map S : W V such that S T = id V and T S = id W. We will see that a linear map is an isomorphism if and only if it is both a monomorphism and an epimorphism (i.e. if and only if it is bijective as a map between sets). Two vector spaces are isomorphic if there is an isomorphism between them. Example 3... The map sending every vector in V to the zero vector of W is linear. We call it the zero map from V to W. As a result, Hom K (V, W ) (or simply Hom(V, W )), the set of linear maps from V to W is always non-empty. One checks directly that sum of two linear maps is linear and a scalar multiple of a linear map is linear (Exercise 3.). Moreover, Hom(V, W ) equipped with these two operations is a vector space over K with the zero map as its zero vector. If V = W, we write End(V ) instead of Hom(V, V ) and call its elements endomorphisms of V. 5

6 CHAPTER 3. LINEAR MAPS Example 3... The identity map id V of V. from V to itself is an endomorphism One checks directly that a composition of linear maps is linear and that (End(V ),, id V ) is a ring with function addition and composition (Exercise 3.3). The group of units of this ring, i.e. the group of isomorphisms from V to itself is called the general linear group of V and is denoted by GL(V ). The identity element of GL(V ) is id V. Example 3..3. Let V be a vector space. The inclusion map of a subspace of V into V is a monomorphism. In fact, a subset U of V is a subspace if and only if the inclusion of U into V is linear. Example 3..4. The canonical embeddings of V and W into V W are monomorphisms. Example 3..5. The canonical projection π U : V V/U is an epimorphism. Since linear maps preserve vector space operations, it is clear that Proposition 3... Let T : V W be linear then T ( X ) = T (X) for any X V. Consequently, Proposition 3... The image (pre-image) of a subspace under a linear map is a subspace. Proof. Suppose T : V W is linear and V is a subspace of V. Since V is a subspace, V = V. So according to Proposition 3.. T (V ) = T ( V ) = T (V ). Therefore, T (V ) is a subspace of W. To see that the pre-image of a subspace is also a subspace, let W be a subspace of W. So W = W and by Proposition 3.. T ( T (W ) ) = T T (W ) W = W Thus T (W ) T (W ) and since the reverse inclusion always holds, we have T (W ) = T (W ). Thus T (W ) is a subspace of V. Definition 3... The kernel of a linear map T : V W is the set ker T = {v V : T (v) = }. The image of T, im T, is the set T (V ) = {T (v): v V }. It follows from Proposition 3.. that ker T is a subspace of V and im T is a subspace of W. Moreover, T ({ V }) is a subspace of W and is a singleton hence must be { W }. This shows that T ( V ) = W. That means a linear map must send the zero of its domain to the zero of its codomain.

3.. LINEAR MAPS 7 Proposition 3..3. A linear map T : V W is injective if and only if ker T = { V }. Proof. If T is injective, then ker T = T ( W ) is a singleton. Since it is also a subspace of V, so it must be { V }. Conversely, suppose ker T = { V } and that T (v ) = T (v ). Then v v ker T because T (v v ) = T (v ) T (v ) = W. So v v = V, i.e. v = v. Therefore, T is injective. Example 3..6. Let X be a linearly independent subset of V. Then any map T from the set X to a vector space W extends uniquely to a linear map T from X to W : for v X, T (v) = c x T (x) x X where x X c xx is the unique linear combination of vectors in X expressing v. It is straight-forward to check that T, defined as above, is linear and extends T to X. For uniqueness, suppose T is another linear extension of T to X, then ) T (v) = T ( x X = x X c x x = x X c x T (x) = x X c x T (x) c x T (x) = T (v) We call T the extension of T by linearity. Since this extension is canonical, from now on we use the same letter to denote a map and its linear extension to the span of its domain. In particular, every map from a basis B of V to a vector space W extends uniquely to a linear map from V to W. Consequently, a linear map T is completely determined by its action on a basis of its domain. Next we examine the behaviour of linear dependence under linear maps. Proposition 3..4. The pre-image of an linearly independent set under a linear map is linearly independent. Proof. Let T : V W be a linear map and Y be a linearly independent subset of W. Suppose c x + + c n x n is a linear combination of some x,..., x n T (Y ) expressing V. Then W = T ( V ) = T (c x + + c n x n ) = c T (x ) + + c n T (x n ). Since T (x i ) Y ( i n) and Y is linearly independent so c =... = c n =. This shows that T (Y ) is linearly independent. The image of an linear independent set under a linear map, however, needs not be linearly independent: for example, take T : V W to be the zero map and X to be a basis of V. However, injective linear maps do preserve linear independence.

8 CHAPTER 3. LINEAR MAPS Proposition 3..5. The image of a linearly independent set under an injective linear map is linearly independent. Proof. Let T : V W be an injective linear map and X be a linearly indepedent subset of V. Suppose some linear combination c y + + c n y n of elements of T (X) expresses W. We need to show that the c i s are all zeros. Since y i T (X), so y i = T (x i ) for some x i X ( i n). Thus ( n n n ) W = c i y i = c i T (x i ) = T c i x i. i= i= Since T is injective, by Proposition 3..3, c x + + c n x n = V. And that implies the c i s are all zeros because X is linearly independent. Proposition 3..6. A bijective linear map sends a basis to a basis Proof. Suppose T : V W is a bijective linear map and B is a basis of V. Since T is injective and B is linearly independent, by Proposition 3..5 so is T (B). Since T is surjective and B spans V, W = T (V ) = T ( B ). By Proposition 3.. T ( B ) = T (B) so T (B) spans W as well. We end this section by showing that dimension is the only isomorphic invariant of vector spaces. Theorem 3..7. Two vector spaces are isomorphic if and only if they have the same dimension. Proof. Suppose T : V W is an isomorphism. Since T maps basis to basis (Proposition 3..6) and is bijective, the bases of V and W must have the same cardinality, i.e. dim V = dim W. Conversely, suppose V and W have the same dimension. Let B be a basis of V and D be a basis of W. Since B and D have the same cardinality, there is a bijection mapping B onto D. Let T and S be the unique linear maps extending this bijection and its inverse to V and to W, respectively. Since S T restricts to the identity map on B and T S restricts to the identity map on D, by uniqueness of linear extension we conclude that T and S themselves are inverse of each other hence V and W are isomorphic. Example 3..7. Let V be an n-dimensional vector space over K. An order on a basis B of V can be viewed as a bijection from B to the standard basis of K n and vice versa (the i-th element of B maps to e i ( i n)). Thus an ordered basis B of V determines, according to Theorem 3..7, a unique isomorphism Φ B from V to K n. The image of v V under Φ B, denoted by (v) B, is called the coordinate vector (or simply the coordinates) of v with respects to the ordered basis B. Example 3..8. To compute the coordinates of a vector with respect to a given basis amongs to writing the given vector as a linear combination of the vectors i=

3.. MATRIX REPRESENTATION OF LINEAR MAPS 9 in the given basis. As an example, let us compute the coordinates of 3x P with respect to the (ordered) basis D = {, + x, x + x }. We need to find a, a, a 3 K such that because then 3x = a () + a ( + x) + a 3 (x + x ) (3.) (3x ) D = Φ D (3x ) = a Φ D () + a Φ D ( + x) + a 3 Φ D (x + x ) a = a e + a e + a 3 e 3 = a a 3. To find the a i s, rewrite Equation (3.) as 3x = (a +a )()+(a +a 3 )x+a 3 x. Then by comparing coefficients on both side, we arrive to the following system of equations: a + a = a + a 3 = (3.) a 3 = 3 Solving the system yields a = 3, a = 3 and a 3 = 3. So (3x ) D = I leave you, as an exercise, to check that (x) D = Example 3..9. Since matrix multiplication distributes over addition and respects scalar multiplication, any A M m n (K) determines a linear map m A from K n to K m defined by m A (v) = Av. In fact, every linear map from K n to K m is a matrix multiplication: suppose T is a linear map from K n to K m, then for any c = (c,..., c n ) K n, ( ) T (c) = T ci e i = c T (e ) +... + c n T (e n ). So T = m (T ) where (T ) is the m n matrix with T (e i ) K m ( i n) as its ith column. In fact, one checks readily that T m (T ) is an isomorphism between Hom(K n, K m ) and M m n (K). 3. Matrix Representation of Linear Maps In this section, we will show that every linear map T : V W between finite dimensional vector spaces, after choosing ordered bases for V and W, is of the form m A for some matrix A. From now on when we talk about bases of vector spaces we always assumed they are ordered bases and we simply refer them as bases.. 3 3 3.

3 CHAPTER 3. LINEAR MAPS Suppose B = {v,..., v n } and D = {w,..., w m } are bases of V and W, respectively. Given v V, let (v) B = (c,... c n ) K n be its coordinates with respect to B. By linearity of T, we have n n T (v) = T c j v j = c j T (v j ). (3.3) j= For each j n, T (v j ) can be expressed uniquely as a linear combination m i= a ijw i of vectors in D, (note that (T (v j )) D = (a j,..., a mj )), so n n m m n T (v) = c j T (v j ) = c j a ij w i = a ij c j w i. (3.4) j= j= i= By identifying the vectors with their coordinates with respect to these bases, we see that T is the map n j= c a jc j a a n.. =... c.. (3.5) n c n j= a mjc j a m a mn c n The matrix (a ij ) is called the representation of T with respect to B and D and denote it by (T ) D B. It follows from Equation (3.3) to (3.5) that [T ]D B is the matrix whose j-th column is (T (v j )) D the coordinates of T (v j ) with respects to D where v j is the j-th element of the basis B. A more conceptual description of (T ) D B is that it is the unique matrix that makes the follow diagram commutes: j= i= j= V T W (3.6) Φ B Φ D K n m (T ) D B K m Example 3... Let us compute the matrix representation of the derivation operator T from R 3 [x] to R [x] with respect to the basis B = {, x, x, x 3 } of R 3 [x] and the basis D = {, + x, x + x } of R [x]. According to the discussion above, (T ) D B is the 3 4 matrix whose columns are the coordinate vectors of T () =, T (x) =, T (x ) = x and T (x 3 ) = 3x with respect to D. Since (see Example 3..8), () D =, () D =, (x) D = the matrix representation of T with respect to B and D is 3 (T ) D B = 3 3, (3x ) D = 3 3 3,

3.. MATRIX REPRESENTATION OF LINEAR MAPS 3 Definition 3... Let B and B be two bases of V. The matrix (id V ) B B is called the change of basis matrix from B to B. (see P.36 of []). Note that the change of basis matrix (id V ) B B simply sends (by left multiplication) the coordinates of a vector with respect to B to its coordinates with respect to B, i.e. (v) B = (id V ) B B (v) B The following facts about change of basis matrices are clear: Proposition 3... Let V be an n-dimensional vector space, B, B, B are bases of V.. (id V ) B B = I n.. (id V ) B B (id V ) B B = (id V ) B B. Example 3... Consider the bases B = {, + x, + x + x } and D = {, x, x x} of R [x]. Let us find the change of basis matrix from B to D. Since Therefore, = () + ()(x ) + ()(x x) + x = () + ()(x ) + x + x = (3) + ()(x ) + (x x) () D = (,, ), ( + x) D = (,, ), ( + x + x ) D = (3,, ) and so the change of basis matrix from B to D is (id) D B = 3 Let us verify that (id) D B sends (v)b to (v) D. For example, suppose (v) B =. That means v is the vector And so (v) D should be ( )() + ()( + x) + ()( + x + x ) = + 3x + x. (id) D B (v) B = 3 And this is indeed the case because = 7 5. 7() + 5(x ) + (x x) = + 3x + x = v.

3 CHAPTER 3. LINEAR MAPS Example 3..3. From the parallelogram law of vector addition it is easy to see that R α rotation by an angle α about the origin is a linear map from R to itself. To find the matrix representing R α with respect to the standard basis E = {(, ), (, )} of R, we compute the coordinates of the images of (, ) and (, ) with respect to E. R α ( ) = ( cos (α) sin (α) ) R α ( ) = ( sin (α) cos (α) ) And so ( (R α ) E cos (α) sin (α) E = sin (α) cos (α) ) The map ρ D B sending T to (T )D B is in fact an isomorphism from Hom K(V, W ) to M m n (K). Consequently, for any finite dimensional vector spaces V and W, dim Hom(V, W ) = dim V dim W. Furthermore, an isomorphism of the form ρ D B sets up a dictionary between linear maps (abstract) and matrices (concrete). For example, image of a linear map corresponds to the column-space of a matrix, kernel corresponds to nullspace. As a result, we have Theorem 3... Let T : V W be a linear map between finite dimensional vector spaces. Then nullity T + rank T = dim V. Proof. Suppose dim V = n and dim W = m. Let A M m n (K) represent T (with respect to some bases). Then rank T = dim im T = dim column space of A = rank A. On the other hand, nullity T = dim ker T = dim null space of A which equals to the number of free columns in the rref of A. And so it is the number of columns (i.e. n) minus the number of pivot columns, i.e. n rank A. Putting these together, we get nullity T = n rank T. Since composition of linear maps corresponds to multiplication of matrices (see Exercise 3.8), when V = W and B = D then ρ B B (or simply ρ B) is a K-algebra isomorphism from End(V ) to M n (K). And GL(V ), the invertible elements of End(V ) maps to the invertible elements of M n (K). This leads to the following definition: Definition 3... A square matrix A M n (K) is invertible if there exists a matrix B M n (K) such that AB = I n = BA. Note that the inverse of A, if exists, is unique. It is because if B and B are inverses of A, then B = BI n = B(AB ) = (BA)B = I n B = B.