Prelims Linear Algebra I Michaelmas Term PDF Free Download

Prelims Linear Algebra I Michaelmas Term 2014 1 Systems of linear equations and matrices Let m,n be positive integers. An m n matrix is a rectangular array, with nm numbers, arranged in m rows and n columns. For example ( ) 1 5 0 3 0 2 is a 2 3 matrix. We allow the possibility of having just one column, or just one row, such as 3/2 ( ) 9 or 0.5 19 25. 17 In general we write an m n matrix X as x 11 x 12... x 1n x 21 x 22... x 2n x 31 x 32... x 3n...... x m1 x m2... x mn where x ij is the (i,j)-th entry of X and appears in the i-th row and in the j-th column. We often abbreviate this and write X = [x ij ] m, n i=1,j=1 or [x ij] m n or just [x ij ] if it s clear (or not important) what m and n are. IfA = [a ij ] m n andb = [b ij ] p q thenaandb areequal (andwewritea = B) if and only if (1) m = p and n = q; and (2) a ij = b ij for all i {1,...,m} and j {1,...n}. 1.1 Addition and scalar multiplication of matrices Definition 1.1 Suppose A and B are m n matrices whose entries are real numbers. Define the sum A+B to be the m n matrix whose (i,j)-th entry is a ij +b ij. Note that A and B must have the same size, and then A + B also has the same size with entries given by adding the corresponding entries of A and B. For example ( ) ( ) ( ) 1 4 1 2 0 2 + = 3 0 3 4 6 4 If we write A+B then we assume implicitly that A and B have the same size. Definition 1.2 The m n matrix with all entries equal to zero is called the zero matrix, and written as 0 m n or just as 0. 1

Remark 1.3 We can define A + B in exactly the same way if the entries of A and B are complex numbers, or indeed if the entries of A and B belong to any field F (see Remark 2.2 later). Theorem 1.4 (1) Addition of matrices is commutative; that is, A+B = B+A for all m n matrices A and B. (2) Addition of matrices is associative; that is, A+(B +C) = (A+B)+C for all m n matrices A, B and C. (3) We have A+0 = A = 0+A for every matrix A. (4) For every m n matrix A there is a unique m n matrix B with A+B = 0. Proof (1) If A and B are of size m n then A+B and B +A are also of size m n. The ij entry of A+B is a ij + b ij. The ij entry of B + A is b ij + a ij. Now, a ij and b ij are real numbers and addition of real numbers is commutative, so a ij +b ij = b ij +a ij. (2) is left as an exercise. (3) We have A+0 = [a ij +0 ij ] = [a ij +0] = [a ij ] = A and A+0 = 0+A. (4) Given A = [a ij ], take B to be the matrix [ a ij ] whose (i,j)-th entry is a ij. Then A+B = [a ij +( a ij )] m n = 0 m n. If also A+C = 0 then for all i,j we have a ij +c ij = 0 and hence c ij = a ij, so C = B. The matrix B in Theorem 1.4 (4) is called the additive inverse of A, and we write B = A. Given matrices A,C of the same size we also write A C as shorthand for A+( C). Exercise 1.5 Prove that if A and B have the same size then (A+B) = A B. Definition 1.6 Given a matrix A and a number λ, define the product of A by λ to be the matrix, denoted by λa, obtained from A by multiplying every entry of A by λ. That is, if A = [a ij ] m n then λa = [λa ij ] m n. This is traditionally called scalar multiplication where the word scalar means number (real or complex, later others). The main properties are: Theorem 1.7 Suppose A and B are m n matrices, then for any scalars λ,µ (1) λ(a+b) = λa+λb; (2) (λ+µ)a = λa+µa; (3) (λµ)a = λ(µa); (4) 1 A = A. The proof is to be completed. Exercise 1.8 Prove that for an m n matrix A, taking scalars 1 and 0 gives: (a) ( 1) A = A and (b) 0 A = 0 m n. 2

1.2 Matrix multiplication Definition 1.9 Let A = [a ij ] m n and B = [b ij ] n p (note the sizes!). Then one defines the product AB to be the m p matrix whose (i,j)-th entry is [AB] ij = a i1 b 1j +a i2 b 2j +...+a in b nj. That is, we multiply the elements in the i-th row with the elements of the j-th column and take the sum. This is usually abbreviated as n [AB] ij = a ik b kj. For example, if k=1 3 0 A = 1 2, B = 1 1 ( ) 4 1, then 0 2 3 4+0 0 3 ( 1)+0 2 AB = ( 1) 4+2 0 ( 1) ( 1)+2 2 = 1 4+1 0 1 ( 1)+1 2 12 3 4 5. 4 1 A matrix is square if it is of size n n for some n; that is, it has the same number of rows and columns. Consider the n n matrix 1 0 0... 0 0 1 0... 0 { I n = 0 0 1... 0 1 if i = j, whose (i,j)-th entry is [I n ] ij = 0 if i j..... 0 0 0... 1 The matrix I n is called the identity matrix of size n n. This is an example of a diagonal matrix: a square matrix D = [d ij ] n n is said to be diagonal if d ij = 0 whenever i j. That is, all its entries are zero off the leading diagonal. Theorem 1.10 Assume A,B,C are matrices, and λ is a scalar. Whenever the sums and products are defined, (1) A(BC) = (AB)C. (2) A(B +C) = AB +AC, (B +C)A = BA+CA. (4) AI n = A = I n A; (3) λ(ab) = (λa)b = A(λB). Proof (a) For A(BC) to be defined we need the sizes to be m n, n p and p q. These are precisely the conditions one needs (AB)C to be defined. Assume these, then we calculate the (i,j)-th entry of A(BC) to be [A(BC)] ij = n a ik [BC] kj = k=1 n p a ik [ b kt c tj ] = k=1 3 t=1 n k=1 t=1 p a ik b kt c tj.

We calculate the (i,j)-th entry of (AB)C to be [(AB)C)] ij = p [AB] it c tj = t=1 p n ( a ik b kt )c tj = t=1 k=1 These are the same, for arbitrary i,j, so (AB)C = A(BC). The other parts are to be completed. p t=1 k=1 n a ik b kt c tj. Property (1) in this theorem is called associativity for matrix multiplication, and property (2) is known as the distributive law for matrices. Because of the associativity of matrix multiplication we write usually just ABC instead of A(BC) or (AB)C. In addition, for every positive integer n we write A n for the product AA A (with n terms). Example 1.11 The matrices A = ( ) 0 1, B = 0 0 ( ) 1 0 0 0 satisfy AB = 0 and BA = A. This shows that in general matrix multiplcation is not commutative. We also see that it is possible to have matrices A and B with AB = 0 but A and B both non-zero. Definition 1.12 Suppose A is an n n matrix. Then A is invertible if there is some n n matrix X such that AX = I n and XA = I n. If A is invertible then this matrix X is unique. To prove this, suppose that X is another matrix such that AX = I n = X A. Then we must show that X = X. But X = XI n = X(AX ) = (XA)X = I n X = X as required. Therefore it makes sense to call X the inverse of A and write X = A 1. Lemma 1.13 Suppose that A and B are invertible n n matrices. Then AB is invertible, with inverse B 1 A 1. Proof We have (AB)(B 1 A 1 ) = A(BB 1 )A 1 = AI n A 1 = AA 1 = I n and similarly one calculates (BA)(A 1 B 1 ) = I n. This lemma says that if A and B are invertible of the same size then AB is invertible and (AB) 1 = B 1 A 1. 4

1.3 Systems of linear equations One important application of matrix technology is to solve linear equations. In this section we will start on this, and later we will review it when we can use more advanced technology. Definition 1.14 A system of linear equations in n variables is a list of linear equations a 11 x 1 +a 12 x 2 +...+a 1n x n = b 1 a 21 x 1 +a 22 x 2 +...+a 2n x n = b 2... a m1 x 1 +a m2 x 2 +...+a mn x n = b m where the a ij and the b i are numbers (in R or C, or any other field F). We write this in matrix form, as Ax = b where A is the m n matrix A = [a ij ] m n, and where x and b are column vectors; that is x 1 b 1 x 2 x =., b = b 2.. x n b m The system of linear equations is called homogeneous if b = 0. Example 1.15 Consider the system of linear equations 2x 2 +2x 3 4x 4 = 2 (1) x 1 + 2x 2 +3x 3 = 5 (2) 5x 1 + 8x 2 +13x 3 +4x 4 = 23 (3). This becomes Ax = b where x 0 2 2 4 1 A = 1 2 3 0, x = x 2 2 x 5 8 13 4 3, b = 5 23 x 4 To solve the equations, one might start by interchanging equations (1) and (2), to get the x 1 to the top left place. Then one can eliminate x 1, via 5x 1 +8x 2 +13x 3 +4x 4 5(x 1 +2x 2 +3x 3 ) = 23 25, which gives 2x 2 2x 3 +4x 4 = 2. Then we have two equations with fewer variables, and we can repeat the process. We translate this process into matrix notation. Write down the matrix B obtained by concatenating matrix A and the vector b; this gives 0 2 2 4 2 B = [A b] = 1 2 3 0 5. 5 8 13 4 23 5

The matrix B is called an augmented matrix. Write R i for the i-th row of B. We have first interchanged R 1 and R 2 ; we write this as 1 2 3 0 5 R B = [A b] 1 R 2 0 2 2 4 2. 5 8 13 4 23 Then we replaced R 3 by R 3 5R 1. We write this as R 3 R 3 5R 1 1 2 3 0 5 0 2 2 4 2 0 2 2 4 2 Next, we can replace the last row by a row of zeros by adding row 2 to row 3 1 2 3 0 5 R 3 R 2 +R 3 0 2 2 4 2. 0 0 0 0 0 Then we can replace R 1 by R 1 R 2, and finally we divide R 2 by 2. This gives 1 0 1 4 3 R 1 R 1 R 2, R 2 (1/2)R 2 0 1 1 2 1. (1) 0 0 0 0 0 The corresponding equations x 1 + x 3 +4x 4 = 3 x 2 +x 3 2x 4 = 1 have exactly the same solutions as the original equations, and now it is easy to describe all the solutions. We can assign arbitrary values to x 3 and x 4, say x 3 = α and x 4 = β. Then the values of x 2 and x 1 are uniquely determined by the equations in terms of α and β; that is x 1 = α 4β +3, x 2 = α+2β +1. The general solution to the system of linear equations can thus be written as or equivalently for arbitrary α and β.. x 1 = α 4β +3, x 2 = α+2β +1, x 3 = α, x 4 = β x 1 α 4β +3 1 4 3 x 2 x 3 = α+2β +1 α = α 1 1 +β 2 0 + 1 0 x 4 β 0 1 0 What we did in this example can be generalised to give a method for finding the general solution to any system of linear equations. The strategy is to transform the augmented matrix B by reversible steps, without changing the solutions to the corresponding systems of linear equations, to a nice form E for which one can easily describe all the solutions. The transformations we will use are called elementary row operations (EROs). The nice form to aim for is known as reduced row echelon form (RRE form); the matrix (1) has this shape. 6

1.4 Elementary row operations and reduced row echelon form Example 1.16 Examples of matrices in reduced row echelon form are 1 0 ( ) ( ) 0, 0 1 0 0 0 3 1 0 0 0 0 1 0,,, 0 1 4 0 2 0. 0 0 0 0 0 0 0 0 0 1 0 1 0 More generally, the following matrix is in reduced row echelon form: 0 0 1 0 0 0......... 0 0 1 0........... 0 0 1 0... 0 1 0 0 0 0.. 0 0 We make a formal definition: Definition 1.17 The m n matrix E is in reduced row echelon form (RRE form) if (i) the zero rows of E lie below the non-zero rows; (ii) in each row which is not zero, the leading entry (that is, the left-most non-zero entry) is 1; (iii) if row i and row i+1 are non-zero then the leading entry of row i+1 is strictly to the right of the leading entry of row i; (iv) if a column contains a leading entry of some row then all other entries in this column are zero. In order to transform a matrix to reduced row echelon form, one uses elementary row operations: Definition 1.18 There are three types of elementary row operations (EROs) which can be applied to a matrix B; they are defined as follows. Let R i be the i-th row of B. (1) Interchange R i and R j ; (2) Replace R i by R i := cr i where c is a non-zero scalar; (3) Replace R i by R i := R i +dr j where d is a scalar, and i j. Each of these operations can clearly be reversed by another ERO of the same type which is the inverse operation. Operation (1) is its own inverse, for the inverse of (2) one replaces R i by (1/c)R i = R i, and for the inverse of (3) one replaces R i by R i dr j. Applying an ERO to B is the same as premultiplying B by an invertible matrix: Lemma 1.19 Applying an ERO to an m n matrix B gives us PB where P is the result of applying the same ERO to the m m-identity matrix. 7

Proof: ERO (1) is the same as replacing B by PB where P is the m m permutation matrix P = [p rs ] with 1 if r = s i,j p rs = 1 if (r,s) = (i,j) or (j,i) 0 otherwise. ERO (2) is the same as replacing B by PB where P is the m m diagonal matrix with i-th diagonal entry c and all other diagonal entries equal to 1. ERO (3) is the same as replacing B by PB where P = I m + de ij. Here E ij is the m m matrix which has a 1 in position ij and is 0 otherwise. Definition 1.20 The matrices P defined by applying EROs to an identity matrix are called elementary matrices. An elementary matrix is invertible; its inverse is the elementary matrix corresponding to the inverse ERO. Lemma 1.21 Applying an ERO to an augmented matrix B = [A b] does not alter the set of solutions x to the system of linear equations given by Ax = b. Proof: By Lemma 1.19 an ERO transforms B = [A b] to PB = [PA Pb] where P is the corresponding elementary matrix. If x satisfies Ax = b then it also satisfies PAx = Pb, so every solution to the original system of linear equations is also a solution to the transformed system. Since the ERO can be reversed by another ERO (its inverse), it follows that every solution to the transformed system is also a solution to the original system of linear equations. The following theorem is sometimes called Gauss elimination. Writing the proof on the board is not illuminating, so this is for private reading, but the theorem is very important. Theorem 1.22 Suppose B is some m p matrix. Then B can be transformed to a matrix E in reduced row echelon form by a finite sequence of elementary row operations. Thus there is an invertible matrix P = P s P s 1...P 1, a product of elementary matrices P 1,P 2,...,P s, such that PB = E where E is a reduced row echelon matrix. Proof We will show, by induction on m, that B can be transformed to a reduced row echelon matrix E via ERO s. This will prove the theorem, since by Lemma 1.19 an ERO is the same as premultiplication by an elementary matrix. If B is the zero matrix then it is already in RRE form and we have nothing to prove. So we can assume that B 0. Suppose m = 1, then we use an ERO of type (2), and premultiply by a non-zero scalar to make the leftmost non-zero entry equal to 1. This is then in RRE form. Now assume m > 1. 8

(i) Reading from the left, the first non-zero column of B has some non-zero entry. By interchanging rows if necessary we can move a non-zero entry of this column to the first row. This gives a matrix B 1 of the form 0... 0 b 11... b 1k 0... 0 b 21... b 2k B 1 =.......... 0... 0 b m1... b mn with b 11 0. (ii) We transform B 1 to B 2 of the form 0... 0 b 11 b 12... b 1k 0... 0 0 c 22... c 2k B 2 =........... 0... 0 0 c m2... c mn using EROs of type (3) to subtract scalar multiples of the first row from the other rows. (iii) Now consider the submatrix C := [c ij ] 2 i,2 j k of B 2, of size (m 1) (k 1). By the induction hypothesis, C can be transformed to a matrix E 1, in row reduced echelon form. The same EROs applied to B 2, and then an ERO of type (2) applied to the first row, transform B 2 to B 3 where (writing it as block matrix) ( ) 0 0... 0 1 B 3 =. 0 0... 0 0 E 1 (iv) By using EROs of type (3) we make each entry in the top row of B 3 to zero if it is in a column which has leading 1 of E 1. This has then produced the required matrix E in RRE form. Remark 1.23 One can show that given B, then this E is unique. A proof is given in Blyth- Robertson, Theorem 3.11. It is easy to keep track as one goes along of the product of elemetary matrices used in Gauss elimination, and so to calculate the matrix P in Theorem 1.22. This is done as follows: take the block matrix [B I m ] which is the concatenation of B and the identity matrix I m. Then perform the EROs on [B I m ]. This results in the matrix [E P] where the block P is the matrix we want, since, taking products of matrices, we have P[B I m ] = [PB PI m ] = [E P]. It is strongly recommended that you should keep track in this way of the product of elementary matrices, and then at the end calculate PB to make sure that it is equal to E. If it is not, then you have made a mistake somewhere. 9

Example 1.24 Let B = [A b] be the matrix as in Example 1.15. To keep track of the elementary matrices along the way, we perform the EROs on the block matrix [B I 3 ]. Then we end up with a matrix [E P] such that PB = E. Explicitly: [B I 3 ] = 0 2 2 4 2 1 0 0 1 2 3 0 5 0 1 0 5 8 13 4 23 0 0 1 1 2 3 0 5 0 1 0 0 2 2 4 2 1 0 0 0 2 2 4 2 0 5 1... 1 2 3 0 5 0 1 0 0 2 2 4 2 1 0 0 5 8 13 4 23 0 0 1 1 0 1 4 3 1 1 0 0 1 1 2 1 1/2 0 0 0 0 0 0 0 1 5 1 which is [E P] where P is the matrix 1 1 0 1/2 0 0 1 5 1 This does indeed satisfy PB = E Exercise 1.25 Transform the matrix M to reduced row echelon form E, where 1 1 1 0 M = 1 1 0 1 1 0 1 1. 0 1 1 0 At the same time calculate the matrix P so that PM is the reduced row echelon matrix E. 1.5 Solving systems of linear equations using EROs Consider a system of linear equations given by Ax = b where A = [a ij ] is a given m n matrix and b 1 b m b 2 b =. is a given column vector of length m, both with entries in a field F (usually F = R or C or Q). We want to find all the solutions x 1 x 2 x =. x n 10

to the system of equations Ax = b with x j F. To do this we consider the augmented matrix B = [A b]. By Theorem 1.22 we can put B into RRE form [E d] by applying a finite sequence of EROs. By Lemma 1.21 the solutions to the original system of linear equations Ax = b are exactly the same as the solutions to the system Ex = d. Note that as [E d] is in RRE form, so is E. Moreover if E has exactly l nonzero rows then d has either the form for some d 1,...d l. d 1 d 2 d 2.. d = d l 1 (case 1) or d = d l 0 (case 2) 0 0.. 0 0 In case 1 the (l+1)th equation in the system Ex = d has the form 0x 1 +0x n = 1 and so the system of equations has no solutions. In case 2 we can write down the general solution to the system of equations as we did in Example 1.15. First we assign arbitrary values, say α, β, γ..., to the variables x j corresponding to the columns of E which do not contain the leading entry of any row. Then the nonzero equations in the system Ex = d determine the remaining variables uniquely in terms of these parameters α, β, γ... and the entries of E and d. 2 Vector spaces The concept of a vector space is an abstract notion which unifies the familiar properties of vectors in the plane and in 3-dimensional space, and allows us to study, for example, systems of linear equations. Definition 2.1 Let F = R or C. A vector space over F is a set V together with two operations, one from V V to V called addition, the other from F V to V called scalar multiplication ; that is, for all u,v V, there is a unique element u+v V, and for all v V and λ F there is a unique element λv V. Moreover, these operations must satisfy the following axioms: (V1) u+v = v +u for all u,v V. (V2) (u+v)+w = u+(v +w) for all u,v,w V. (V3) there is some element 0 V V such that v +0 V = 0 V +v = v for every v V. (V4) for every v V there exists v V such that v +( v) = 0 V ; (V5) λ(u+v) = λu+λv for all u,v V and all scalars λ F. d 1 11

(V6) (λ+µ)v = λv +µv for all v V and all scalars λ,µ F. (V7) (λµ)v = λ(µv) for all v V and all scalars λ,µ F. (V8) 1v = v for all v V. Elements of V are called vectors, and elements of F are called scalars. When F = R we say that V is a real vector space, and when F = C we say that V is a complex vector space. Remark 2.2 (1) Axioms (V1) to (V4) can be summarised by saying that the structure (V,+) is an abelian group. For a given v V, the element ( v) in (V4) is unique, and it is called the additive inverse of v. (2) Often one wants to use other scalars. Instead of R or C, the set F can be any field. A field F is defined to be a non-empty set F, together with an addition + and a multiplication on F, and such that the axioms for + and hold which you have been given for R in Analysis, that is: A1 to A4 and M1 to M4 and D (but not the ordering properties). Recall that the axioms A1 to A4 say that addition in F is associative and commutative, and that there is an additive identity (written 0 or 0 F ) and each element λ of F has an additive inverse λ in F. Recall also that the axioms M1 to M4 say that multiplication in F is associative and commutative, and that there is a multiplicative identity (written 1 or 1 F ) and each nonzero element λ of F has a multiplicative inverse λ 1 in F \{0}. Finally recall that D is the distributive law: a(b+c) = ab+ac for all a,b,c F. For example, you could take F = Q with the usual operations of addition and multiplication of rational numbers to get a field. Take F to be any field; then a vector space V over F is defined exactly as in Definition 2.1. In this course, we will often take F = R, but most definitions and properties will remain valid if R is replaced by C or Q, or by any other field. The only exception will be when we discuss inner product spaces at the end of the term; there we will need F to be R. Example 2.3 Let m,n 1 be integers, and let M m n (R) be the set of all m n matrices with real entries. Then M m n (R) is a real vector space under the usual addition of matrices and multiplication by scalars. Similarly the set M m n (F) of m n matrices with entries in any field F is a vector space with addition and scalar multiplication of matrices defined using the operations of addition and multiplication in F. The special cases when m = 1 or n = 1 are very important: Example 2.4 The set R n of n-tuples of real numbers is a real vector space under componentwise addition and scalar multiplication: (x 1,...,x n )+(y 1,...,y n ) = (x 1 +y 1,...,x n +y n ) λ(x 1,...,x n ) = (λx 1,...,λx n ) Geometrically, R 2 represents the Cartesian plane, and R 3 represents three-dimensional space. We can also take n = 1; this says that R is itself a real vector space. We usually write the elements of R n as row vectors and thus identify R n with M 1 n (R). Sometimes it is useful to write m-tuples of real numbers as columns. We write (R m ) t for the set M m 1 (R) of column vectors with m real entries (or coefficients); this too is a vector space under componentwise addition and scalar multiplication. 12

Example If V = R 2 then we can display vectors and sums of vectors geometrically. Example 2.5 Let X be any non-empty set, and let F be any field. Define V to be the set V = Map(X,F) = {f : X F} of functions f from X to F. On V we can define addition and scalar multiplication, by using the given addition and multiplcation on the field F; that is, if f,g V then f + g is the function from X to F defined by (f +g)(x) = f(x)+g(x) for all x X and for λ F, the function λf from X to F is defined by (λf)(x) = λf(x) for all x X. One says addition and scalar multiplication are defined pointwise. It is straightforward (but lengthy) to use the field axioms for F to check that V together with these operations is a vector space over F. This may be familiar when F = R and X = R. E.g. if f,g are the functions from R to R given by f(x) = x 2 and g(x) = cos(x) for all x R, then you should know how to draw the graph of 2f, or of f +g. Example 2.6 Let R n [X] be the set of polynomials in the variable X of degree at most n with real coefficients. We write the general element in this set as p(x) = a 0 +a 1 X +...+a n X n where each a i R. The degree of this polynomial (if p(x) is not identically zero) is the largest index j for which a j 0. [N.B. The degree of the zero polynomial is sometimes taken to be 0, sometimes -1 and sometimes.] Define an addition on R n [X] by (a 0 +a 1 X+...+a n X n )+(b 0 +b 1 X+...+b n X n ) = (a 0 +b 0 )+(a 1 +b 1 )X+...+(a n +b n )X n and a multiplication by scalars by λ(a 0 +a 1 X +...+a n X n ) = λa 0 +λa 1 X +...+λa n X n. Then one can check that R n [X] with these operations is a real vector space. For example, the zero vector is the zero polynomial with a i = 0 for all i. If F is any field we can define F n [X] in the same way, replacing R with F, to get a vector space over F whose elements are polynomials in the variable X of degree at most n with coefficients in F. Example 2.7 (1) Let V = C. This is a vector space over R. It is also a vector space over C. (2) Let V = {(a n ) : a n R}, the set of all real sequences. This is a real vector space, with addition and scalar multiplication componentwise. Similarly when F is any field the space {(a n ) : a n F} of sequences in F is a vector space over F. 13

The following theorem is also true when R is replaced with any field F. Theorem 2.8 Assume that V is a vector space over R. Then for all λ R and for all v V, (1) λ0 V = 0 V (2) 0 R v = 0 V. (3) ( λ)v = (λv) = λ( v). (4) If λv = 0 V then either λ = 0 R or v = 0 V. Proof (1) By (V3) and (V5) we have λ0 V = λ(0 V +0 V ) = λ0 V +λ0 V. Add (λ0 V ) to both sides. For (2), use a similar idea, noting that 0 R = 0 R +0 R. (3) We have (using (V5) and (V4) and part (1) of this theorem) λv +λ( v) = λ(v +( v)) = λ0 V = 0 V Adding (λv) to both sides (or by the uniqueness of the additive inverse) we must have that (λv) = λ( v). Similarly (using the axioms, and part (2) of this theorem) λv +( λ)v = (λ+( λ))v = 0 R v = 0 V and as before it follows that ( λ)v = (λv). (4) Suppose λ 0 R, then multiply by (1/λ). We get v = (1/λ)λv = (1/λ)0 V = 0 V. 2.1 Subspaces Definition 2.9 Let V be a vector space over R. A non-empty subset W of V is a subspace provided it is a vector space over R, with addition and scalar multiplication as in V. In particular W must be closed under addition and scalar multiplication in the sense that u+v W and λv W whenever u,v W and λ R, so that addition and scalar multiplication give well-defined operations W W W and R W W. We write W V to mean that W is a subspace of V. As usual this definition is also valid when R is replaced with any field F, as are the lemmas below. In order to verify that some subset of V is a subspace it is not necessary to go through the whole list of axioms for a vector space. Instead, we can use the following test: Lemma 2.10 (Subspace test) Assume that V is a vector space over R. A subset W of V is a subspace if and only if (i) 0 V belongs to W; (ii) if x,y W then x+y W; (iii) if x W and λ R then λx W. 14

Proof Suppose W is a subspace. Then (ii) and (iii) hold, by closure. To see (i), by definition W, so pick some w W. Then also ( 1)w W (property (iii)). Hence w +( 1)w W and this is equal to w+( w) = 0 V. Assume properties (i), (ii) and (iii) hold. By (i) W is not empty. By (ii) and (iii) it is closed under addition and scalar multiplication. One needs to check properties (V1) to (V7) (V1), (V2), and (V5) to (V8) are inherited from V, as well (V3) by property (i). It remains to consider property (V4). Let w W. By (V4) for V there exists w V such that w +( w) = 0 V. We want that ( w) is actually in W. But w = ( 1)w W by property (iii). There is a variation of the subspace test: Lemma 2.11 A non-empty subset W of a vector space V is a subspace if and only if for any λ 1,λ 2 R and w 1,w 2 W one has λ 1 w 1 +λ 2 w 2 W. The proof is to be completed. Example 2.12 (1) {0 V } is a subspace of V. Also V is a subspace of V. A subspace W with {0 V } W V is called a proper subspace of V. (2) R is a subspace of the real vector space C. So is ir. (3) Let V = R 2, the usual plane regarded as a vector space over R. Then any line through the origin is a subspace of V. Proof Let L = {r(a,b) : r R} where 0 V (a,b). Then 0 V = (0,0) = 0(a,b) L. Next, if x = r(a,b) and y = s(a,b) for any r,s R then x+y = (r +s)(a,b) L. If λ R then λx = (λr)(a,b) L. (4) LetV = R 3, theusual3-dimensionalspace. Thenanylinethroughtheoriginisasubspace of V. Also any plane in R 3 which contains the origin is a subspace of V. N.B. A line or plane in R 3 must go through 0 V = (0,0,0) in order to be a subspace of R 3. (5) Let V = M 2 2 (R), the vector space consisting of 2 2 matrices over R. Consider the set ( ) a b W = { : a,b,c R} 0 c of upper triangular matrices in V. Then W is a subspace of V. The proofs of (4) and (5) are to be completed; they are similar to that of (3). Example Let V = Map(R,R) and let U = {f V : f is differentiable}. 15

Then it follows from results from analysis that U is a subspace of V. This subspace U has many subspaces of its own which are relevant to the study of differential equations. For example if µ R then W = {f V : f is twice differentiable and f +µf = 0} is a subspace of V consisting of all the solutions f : R R of the homogeneous differential equation f +µf = 0. Note that f +µf = 0 if and only if g +µf = 0 where g = f, so we can also identify W with the subspace of U U given by the pairs (f,g) U U satisfying the homogeneous system of differential equations f g = 0 = g +µf. There is a very important type of subspace of the vector space (R n ) t of column vectors, determined by a fixed m n real matrix A. Lemma 2.13 Suppose A is a real m n matrix. Let Σ A := {x (R n ) t : Ax = 0}. Then Σ A is a subspace of the column space (R n ) t, called the solution space for the system of homogeneous linear equations Ax = 0. Proof A0 V = 0 V (directly), so 0 V Σ A. If x,y Σ A and λ R then A(x+y) = Ax+Ay = 0 V +0 V = 0 V and A(λx) = λ(ax) = λ0 V = 0 V. It is very easy to construct subspaces of a vector space V. One way is to start with any set of vectors in V and find the smallest subspace that contains this set. Definition 2.14 Let V be a vector space and S = {v 1,...,v n } V be a finite nonempty subset of V. A linear combination of the elements v 1,...,v n of S is a vector in V of the form v = λ 1 v 1 +...+λ n v n for some scalars λ 1,...,λ n. The span of S is the set of all linear combinations of the elements of S, written Sp(S) = {v V : v = λ 1 v 1 +...+λ n v n for some λ i R} = {λ 1 v 1 +...+λ n v n : λ 1,...,λ n R}. More generally if S is any nonempty subset (possibly infinite) of V then the span of S is Sp(S) = {λ 1 v 1 +...+λ n v n : v 1,...,v n S and λ 1,...,λ n R}. By convention the span of the empty set is the zero subspace: Sp( ) = {0}. 16

Lemma 2.15 The span Sp(S) is a subspace of V. It is contained in any subspace of V that contains the set S. Proof to be completed. Example 2.16 Assume A is an m n matrix with entries in R; then each row of A is a vector in R n. Let S R n be the set of these row vectors; then Sp(S) is a subspace of R n. This is known as the row space of S. Similarly the column space of A is the subspace of (R n ) t spanned by the columns of A. 2.2 Subspaces of R 2 Lemma 2.17 (a) Assume (a,b) and (c,d) are vectors in V = R 2 which do not lie on the same line through the origin. Then (i) ad bc 0; (ii) the span Sp{(a,b),(c,d)} is equal to V. (b) The subspaces of V are precisely the lines through the origin, together with {0 V } and V. Proof (a) (i) If ad = bc then (c,d) = (d/b)(a,b) (if b 0), or (c,d) = (c/a)(a,b) (if a 0). (ii) The span of the vectors is contained in R 2. To get equality, we must show that every (u 1,u 2 ) R 2 can be written as a linear combination of (a,b) and (c,d). That is, we must find λ 1,λ 2 such that λ 1 (a,b)+λ 2 (c,d) = (u 1,u 2 ). Equivalently we must be able to solve λ 1 a+λ 2 c = u 1 (1) λ 1 b+λ 2 d = u 2 (2) Take (1) b + (2) a, and we get λ 2 (ad bc) = au 2 bu 1. Since ad bc 0 we find λ 2 = (au 2 bu 1 )/(ad bc). Similarly λ 1 is (du 1 cu 2 )/(ad bc), and these values of λ 1 and λ 2 do satisfy equations (10 and (2) as required. (b) Let W be a non-zero subspace of V, and pick some 0 (a,b) W. Let L be the line through (a,b) and the origin; that is, L = {r(a,b) : r R}. Then L W by (iii) of the subspace test. If L = W then W is a line through 0. Suppose L W. Then there is (c,d) W and (c,d) L. Then the span Sp{(a,b),(c,d)} is contained in W. By (a)(ii) we know that this span is equal to R 2, so W = R 2. 2.3 Subspaces of R 3 Lemma 2.18 Let V = R 3. The proper subspaces of V are precisely (i) the lines through the origin; (ii) the planes which contain the origin. Proof to be completed (similar to Lemma 2.17). 17

3 Linear independence, linear dependence and bases Let V be a vector space over a field F (as usual we are primarily interested in F = R or C). Definition 3.1 Assume that S is a subset of V. We say that S is linearly independent if whenever v 1,...,v n are distinct elements of S and λ 1,...,λ n are elements of F satisfying λ 1 v 1 +λ 2 v 2 +...+λ n v n = 0 V then λ 1 = λ 2 =... = λ n = 0. Otherwise S is linearly dependent; that is, S is linearly dependent if it contains distinct vectors v 1,...,v n for some n 1 such that λ 1 v 1 +...+λ n v n = 0 for some scalars λ 1,...,λ n which are not all zero. We sometimes say the vectors v 1,...,v n are linearly independent to mean that the vectors v 1,...,v n are distinct and the set S = {v 1,...,v n } is linearly independent, although this is sloppy terminology: linear (in)dependence is a property of the set {v 1,...,v n }, not a property of the individual vectors. Example 3.2 Let V = R 3 and v 1 = ( 2,1,0), v 2 = (1,0,0), v 3 = (0,1,0). Then S = {v 1,...,v 3 } is linearly dependent, since, for example, v 1 +2v 2 v 3 = 0. Example 3.3 Let V = R 2 and v 1 = (1,0), v 2 = (0,1), v 3 = (2,3), v 4 = ( 6, 9). Then S = {v 1,v 2,v 3,v 4 } is linearly dependent but its subset {v 1,v 2 } is linearly independent. In fact all the subsets of S with at most two elements except for {v 3,v 4 } are linearly independent; however {v 3,v 4 } and all the subsets of S with at least three elements are linearly dependent. Example 3.4 We want to know whether or not the following three vectors in R 3 are linearly independent: v 1 = (1,3,0), v 2 = (2, 3,4), v 3 = (3,0,4). Let ( ) λ 1 (1,3,0)+λ 2 (2, 3,4)+λ 3 (3,0,4) = (0,0,0). So we have three equations λ 1 +2λ 2 +3λ 3 = 0 3λ 1 +( 3)λ 2 = 0 4λ 2 +4λ 3 = 0 From the second and third equation we get λ 1 = λ 2 = λ 3, and these two equations imply all the others. So we can take for example λ 1 = 1,λ 2 = 1,λ 3 = 1 and then (*) holds. Hence S = {v 1,...,v 3 } is linearly dependent. 18

Example 3.5 We want to know whether or not the following three vectors in R 3 are linearly independent: v 1 = (1,2, 1), v 2 = (1,4,1), v 3 = (1,8, 1). Let ( ) λ 1 (1,2, 1)+λ 2 (1,4,1)+λ 3 (1,8, 1) = (0,0,0). That is, we have three equations λ 1 +λ 2 +λ 3 = 0 2λ 1 +4λ 2 +8λ 3 = 0 λ 1 +λ 2 λ 3 = 0 Add equations 1 and 3 to get λ 2 = 0. Subtract twice equation 1 from equation 2 to get 6λ 3 = 0 and so λ 3 = 0. Then also λ 1 = 0. Thus we have proved that S = {v 1,...,v 3 } is linearly independent. The advantage of the axiomatic vector space definition is that one need not know what exactly the vectors are, but one can still work with them using the vector space axioms to obtain results which are valid in many different circumstances. Example 3.6 Suppose V is any vector space over a field F, and u,v,w V are linearly independent. Then the vectors u+v, u 2v +w, u w are linearly independent. To prove this, let λ 1 (u+v)+λ 2 (u 2v+w)+λ 3 (u w) = 0 with λ i F. Rearrange to get: (λ 1 +λ 2 +λ 3 )u+(λ 1 2λ 2 )v +(λ 2 λ 3 )w = 0. Since u, v, w are linearly independent, it follows that λ 1 +λ 2 +λ 3 = 0 λ 1 2λ 2 = 0 λ 2 λ 3 = 0 Substitute λ 1 = 2λ 2 and λ 3 = λ 2 into the first equation to get 4λ 2 = 0, and thus λ 1,λ 2,λ 3 are all zero. Remark 3.7 Note that S is allowed to be infinite in Definition 3.1, and then S is linearly independent if and only if every finite subset of S is linearly independent. Note also that if S = {v 1,...,v n } is finite with n distinct elements v 1,...,v n then S is linearly independent if and only if whenever λ 1,...,λ n are elements of F satisfying then λ 1 = λ 2 =... = λ n = 0. λ 1 v 1 +λ 2 v 2 +...+λ n v n = 0 V 19

For example, consider V = {(a n ) : a n R}, the set of all real sequences. This is a vector space over R, with addition and scalar multiplication componentwise. Let S = {(1,0,0,...),(0,1,0,...),(0,0,1,0,...),...} be the infinite set consisting of all sequences which have a 1 in one place, and zero elsewhere. Then S is linearly independent. In this course, we will usually focus on finite linearly independent sets. Example 3.8 (1) Let V = R n [X] be the vector space of polynomials of degree at most n over R. The subset {1,X,X 2,...,X n } of V is linearly independent. So is the subset {1,(X +1), (X +1) 2,...,(X +1) n } of V. (2) Let V = Map(R,R) be the vector space over R of functions from R to R. Let S = {f,g,h} where f : R R, g : R R and h : R R are given by f(x) = e 2x, g(x) = x 2, h(x) = x for all x R. Then S is linearly independent. For suppose that λ 1,λ 2,λ 3 R and that λ 1 f +λ 2 g +λ 3 h = 0. This means that λ 1 e 2x +λ 2 x 2 +λ 3 x = 0 for every x R. Take x = 0; then λ 1 1+0+0 = 0 so λ 1 = 0. Then take, for example, x = 1 and x = 1 to get λ 2 = 0 and λ 3 = 0. Properties 3.9 (1) If one of v 1,...,v n is zero then the vectors are linearly dependent. For suppose that v 1 = 0; then we can write, for example, 5v 1 +0v 2 +0v 3 +...+0v n = 0. (2) If S = {v 1,...,v n } is linearly independent then any subset of S is also linearly independent. Proof to be completed. (3) It follows from (2) that if {v 1,v 2,...v k } is linearly dependent for some k < n then {v 1,...,v k,v k+1,...,v n } is linearly dependent. (4) Suppose {v 1,...,v m } is linearly independent but {v 1,...,v m,w} is linearly dependent. Then w must be a linear combination of v 1,...,v m. Proof: Since {v 1,...,v m,w} is linearly dependent there are scalars λ 1,...,λ m and λ, not all zero, such that λ 1 v 1 +...+λ m v m +λw = 0 V. Since {v 1,...,v m } is linearly independent we must have λ 0, and so w = ( λ 1 /λ)v 1 +...+( λ m /λ)v m. 20

3.1 Linear independence and linear dependence in R 2 and R 3 Let V = R 2. (1) Suppose that (a,b) and (c,d) do not lie on the same line through the origin. We claim that they are linearly independent. Let λ 1 (a,b)+λ 2 (c,d) = (0,0). Assume (for a contradiction) that λ 1 0. Then (a,b) = (λ 2 /λ 1 )(c,d) which means that (a,b) and (c,d) lie on the same line through the origin, a contradiction. So λ 1 = 0 and similarly λ 2 = 0. (2) If v 1,v 2 are on the same line through the origin then they are linearly dependent: Let v 1 = (a,b) and v 2 = (c,d), then (c,d) = λ(a,b) and then where the scalar coefficient 1 is not zero. λ(a,b)+( 1)(c,d) = (0,0) (3) Any set {v 1,v 2,v 3 } of three vectors in R 2 is linearly dependent: Case 1 {v 1,v 2 } is linearly dependent. Then there are λ 1,λ 2 not both zero and Then also λ 1 v 1 +λ 2 v 2 = 0 λ 1 v 1 +λ 2 v 2 +0.v 3 = 0 and not all coefficients are zero. Case 2 {v 1,v 2 } is linearly independent. Then if v 1 = (a,b) and v 2 = (c,d) they cannot lie on the same line through the origin and therefore ac bc 0. From Lemma 2.17 we know that every vector (u 1,u 2 ) is in the span of v 1,v 2. So in particular Then v 3 = λ 1 (a,b)+λ 2 (c,d) λ 1 v 1 +λ 2 v 2 +( 1)v 3 = 0 and the coefficient of v 3 is non-zero. Hence {v 1,v 2,v 3 } is linearly dependent. Example 3.9 Let V = R 3. (a) Two vectors in R 3 are linearly independent if and only if they do not lie on the same line through the origin. (b) Three vectors in R 3 are linearly independent if and only if they do not all lie in the same plane through the origin. (c) Four vectors in R 3 are always linearly dependent. [Those who are doing the geometry course can prove this now, using vector products. We will be able to prove this later in this course, from general results.] 21

3.2 Bases Lemma 3.10 The vectors v 1,v 2,...,v n are linearly dependent for some j, the vector v j lies in the span Sp{v 1,...,v j 1,v j+1,...,v n }. Proof to be completed. Definition 3.11 Let V be a vector space over a field F. A subset B of V is a basis of V if (i) it is linearly independent; and (ii) it spans V, that is Sp(B) = V. If V = Sp(B), one also says B is a spanning set for V. Thus a basis is a linearly independent spanning set for V. A basis can be infinite; for example the infinite set S = {(1,0,0,...),(0,1,0,...),(0,0,1,0,...),...} consisting of all real sequences which have a 1 in one place, and zero elsewhere is linearly independent, and so it is a basis for its span Sp(S) which is a subspace of the vector space V of all real sequences. [N.B. Sp(S) is not the whole of V: it consists of those sequences with only finitely many nonzero terms.] We will focus on spaces which have finite bases. Example 3.12 (1) Let V = R n, and let e i = (0,0,...,1,0,...,0) be the vector with 1 in place i and 0 otherwise. Then E := {e 1,...,e n } is a basis of V. We call this the standard basis of R n. (2) R n has many other bases. For example, take V = R 2. We have seen that any two non-zero vectors (a,b) and (c,d) which are not on the same line, span V. and that they are linearly independent. So {(a,b),(c,d)} is a basis of V. (3) Let V = M m n (R). Then V has a basis E = {E (k,l) : 1 k m and 1 l n} where E (kl) is the matrix with a 1 in the (k,l)-th place and zero elsewhere. That is, E (kl) = [e (kl) ij ] M m n (R) where e (kl) ij = 1 if i = k and j = l, and e (kl) ij = 0 otherwise. We call this the standard basis of M m n (R). (4) Let V = R n [X]; this has a basis {1,X,X 2,...,X n }. 22

Theorem 3.13 Let V be a vector space over a field F, and assume S V. Then S is a basis for V every vector v V can be expressed uniquely as a linear combination of elements of S. Thus if S = {v 1,...,v n } has size n then S is a basis for V the map F n V given by is bijective. Proof to be completed. (a 1,...,a n ) a 1 v 1 + +a n v n Definition 3.14 Suppose F = {v 1,...,v n } is a basis for V. Express v V in terms of the basis F as v = a 1 v 1 +a 2 v 2 +...+a n v n where a 1,...,a n F are uniquely determined by v (as in the last theorem). Then the elements a 1,...,a n of F are called the coordinates of v with respect to the basis F. The column vector a 1 a 2. a n is called the coordinate vector of v with respect to the basis F, and we write a 1 a 2 v =. a n This allows us to identify elements of V with vectors in (R n ) t, once we have chosen a basis F = {v 1,...,v n } for V. Remark 3.15 Strictly speaking, to make this identification we need an ordered basis F for V; if we change the order of the elements v 1,...,v n of F we wil not change the set {v 1,...,v n } but we may change the coordinate vector of v. Example 3.16 (a) Let V = M 2 2 (R). This has a basis F = {B 1,B 2,B 3,B 4 } where B 1 = ( ) 0 1,B 1 0 2 = F ( ) 0 1, B 1 0 3 =. ( ) ( ) 1 0 1 0, B 0 1 4 =. 0 1 ( ) 2 1 The coordinate vector of A = with respect to the basis F (ordered as above) is 1 0 1 2 0 1. The coordinate vector of A with respect to the standard basis E is 1 1. 1 0 23

(b) Let V = (R 3 ) t and let F = {v 1,v 2,v 3 } where 1 1 1 v 1 = 1, v 2 = 1, v 3 = 0. 1 0 0 4 2 The vector v = 3 has coordinate vector with respect to F equal to 5. 2 7 3.3 The dimension of a vector space Let V be any vector space over a field F. Recall that a subset B of V is a basis for V if (i) it is linearly independent; and (ii) it spans V; that is, Sp(B) = V. Suppose V is any vector space which has a finite spanning set. We want to show that then V has a finite basis, and furthermore that any two bases have the same number of elements. Then we will be able to define the dimension of V to be the number of elements in any basis. We will assume throughout this chapter that V is a vector space over a field F which has a finite spanning set. If S is any finite spanning set of V, then we will prove first that it contains a basis for V. Theorem 3.17 Suppose S is a finite spanning set for V. Then there is a subset B of S which is a basis for V. The strategy is to pick a subset of S which is linearly independent, and is maximal with this property. Then one shows that this set is a basis. To do so, one needs to exploit the maximality and this will use the following lemma. Lemma 3.18 Suppose that T = {v 1,...,v t } V is a linearly independent subset of the vector space V over F. If v V is not in Sp(T) then T {v} is linearly independent. Proof: This follows immediately from Property 3.9(4). Proof of Theorem 3.17 Take a subset B S which is linearly independent, and maximal with this property (that is, if B B S and B is linearly independent then B = B); we can find such a subset B of S since S is finite and it has at least one linearly independent subset, namely the empty set. After relabelling if necessary we can assume B = {v 1,...,v t }. If j > t and v j Sp(B), then the set B {v j } is linearly independent, by Lemma 3.18, which contradicts the choice of B. So v j Sp(B) for all j > t. To prove that B is a basis, we must show that V = Sp(B). Since B V we have V Sp(B) by the definition of Sp(B). Let v V; since S spans V, it can be written as ( ) v = n λ i v i (λ i R). i=1 24

Now if j > t then v j Sp(B) so we can write v j = t i=1 c ijv i. Substituting these into (*) gives us an expression for v as a linear combination of the elements v 1,...,v t of B. We have seen in Theorem 3.17 that any finite spanning set S for V contains a basis B. Therefore the size of B is at most the size of S. We now show something stronger: if S is any finite spanning set for V and I is any linearly independent subset of V then the size of I is at most the size of S. Here I need not have any relationship with S. The main step in the proof of this is the following lemma. Lemma 3.19 (Steinitz Exchange Lemma) Suppose v 1,...,v n V and y V. Suppose y = λ 1 v 1 +...+λ n v n with λ j R. If λ k 0 for some k then Sp{v 1,...,v n } = Sp{v 1,...,v k 1,y,v k+1,...,v n } Proof: Clearly y Sp{v 1,...,v n }. Also we can write v k = λ 1 k y λ 1 k λ 1v 1... λ 1 k λ k 1v k 1 λ 1 k λ k+1v k+1... λ 1 k λ nv n so v k is in the span of {v 1,...,v k 1,y,v k+1,...,v n }. It follows that and we get equality. Sp{v 1,...,v n } Sp{v 1,...,v k 1,y,v k+1,...,v n } Sp{v 1,...,v n } Theorem 3.20 Assume that V has a finite spanning set S of size n. If I = {w 1,w 2,...,w m } is any linearly independent subset of V of size m then m n. Proof Let S = {v 1,v 2,...,v n }, so that V = Sp(S). The strategy is now to replace elements of S by elements of I until one runs out of elements. Consider w 1 I. Since V = Sp(S), there are scalars λ 1,...,λ n such that w 1 = λ 1 v 1 +...+λ n v n. At least one λ i is non-zero (otherwise w 1 would be zero, but then I would not be linearly independent). Wemayassumei = 1(ifnot, werelabelthev i ). ThenbytheSteinitzExchange Lemma V = Sp{v 1,...,v n } = Sp{w 1,v 2,...,v n }. Now we can write w 2 as a linear combination of w 1,v 2,...,v n : w 2 = µ 1 w 1 +µ 2 v 2 +...+µ n v n. At least one of µ 2,µ 3,...,µ n must be non-zero (otherwise we would have w 2 = µ 1 w 1 and I would not be linearly independent). By relabelling we can assume that µ 2 0. Again, by the Steinitz Exchange Lemma, V = Sp{w 1,w 2,v 3,...,v n }. We can continue in this way. Assume (for a contradiction) that m > n. Then after n steps we have replaced all the v s by w s and we have V = Sp{w 1,...,w n }. Moreover there is another element w n+1 in I, and since w n+1 V = Sp{w 1,...,w n } it is a linear combination of w 1,...,w n. This means that I is not linearly independent, which contradicts the hypotheses of the theorem. Therefore m n and the theorem is proved. 25

Corollary 3.21 Suppose V has a finite spanning set. Then V has a finite basis B, and every basis of V is finite and has the same number of elements as B. Proof : By Theorem 3.17, if V has a finite spanning set S then S contains a basis B for V, which is finite. Suppose A is any basis for V, and note that by Property 3.9(2) every subset of A is linearly independent. Therefore by Theorem 3.20 every finite subset of A has size at most the size B of B, so A is finite and A B. Interchanging the roles of A and B in 3.20 gives B A, and hence B = A. This corollary means that the following definition makes sense; that is, the dimension of V is well defined. Definition 3.22 We call a vector space V finite-dimensional if it has a finite spanning set. Then the number of elements in any basis of V is called the dimension of V, written dimv. If a vector space V is not finite-dimensional then we call it infinite-dimensional and we write dimv =. Example 3.23 (1) The vector space R n has the standard basis E = {e 1,...,e n } with n elements (see Example 3.12(1)), so it has dimension n. (2) The vector space V = M m n (R) has dimension mn. It has a basis {E (k,l) : 1 k m and 1 l n} (see Example 3.12). (3) The vector space R n [X] of polynomials in X of degree at most n with real coefficients has a basis {1,X,...,X n } and hence it has dimension n+1. Suppose that V is a finite-dimensional vector space, and that we are given a linearly independent subset of V (e.g. just one non-zero vector). Often one would like to have a basis that contains this subset. The next theorem shows that such a basis can always be found. [N.B. It also shows that usually there are many different bases for the same vector space V, since for any non-zero vector v V there is a basis for V containing v.] Theorem 3.24 Let V be a finite-dimensional vector space. If I is a linearly independent subset of V, then there is a basis of V containing I. This theorem is very important. We say that any linearly independent subset of V can be extended to a basis. Very often we start with a basis of some subspace W of V, and then this theorem allows us to say that a basis of W can be extended to a basis of V. Proof: Let U = Sp(I); then U is a subspace of V. If U = V then I is already a basis of V, and we are done. Otherwise, take some v 1 V \U, and let I 1 = I {v 1 }. By Lemma 3.18, the set I 1 is linearly independent. 26

Let U 1 = Sp(I 1 ). If U 1 = V then I 1 is a basis for V which contains I, and so we are done. Otherwise choose v 2 V \U 1, and by Lemma 3.18 again I 1 {v 2 } is linearly independent. This argument can be repeated until after finitely many steps it must stop since any linearly independentsubsetofv hassizeatmostdimv. Whenitstopswehavealinearlyindependent subset I k of V with I I k and V = Sp(I k ), so that I k is a basis with the required properties. Corollary 3.25 Assume V is a vector space and dimv = n. Then (a) Any linearly independent subset S of size n is a basis. (b) Any subset J of V with more than n elements is linearly dependent. Proof (a) By the last theorem S can be extended to a basis of V. But any basis has size n, so it must already have been one. (b) Assume for contradiction the set J is linearly independent, then by Theorem 3.20 we have J S = n, a contradiction. So for example any subset of R n with n+1 elements is linearly dependent. Lemma 3.26 Let V be a vector space over a field F, and suppose that S is a subset of V; then the following statements are equivalent: (1) S is a basis. (2) S is a maximal linearly independent subset. (3) S is a minimal spanning set. Proof If S is a linearly independent subset then S is maximal iff S {v} is linearly dependent for all v V \S, iff v Sp(S) for all v V, iff S is a basis. If S is a spanning set for V then S is minimal iff Sp(S \{v}) V = Sp(S) for all v S, iff v Sp(S \{v}) for all v S, iff S is linearly independent, iff S is a basis. The following result is often useful. Theorem 3.27 Assume V is an n-dimensional vector space and that W is a subspace of V. Then (a) W is finite-dimensional and dimw dimv. (b) dimw = dimv W = V Proof (a) Take a maximal linearly independent subset B of W. Then by Lemma 3.26 B is a basis for W. Moreover B is a linearly independent subset of V, so by Theorem 3.24 it can be extended to a basis B 1 say of V, so B B 1. Now dimw = B and dimv = B 1 and (a) is proved. (b) Let B be a basis of W. By Theorem 3.24 B can be extended to a basis B 1, say, of V. Since B B 1 we have dimw = dimv B = B 1 B = B 1 which implies that W = Sp(B) = Sp(B 1 ) = V. 27