Linear Algebra March 16, 2019


 Shanon Osborne
 5 months ago
 Views:
Transcription
1 Linear Algebra March 16, 2019
2 2
3 Contents 0.1 Notation Systems of linear equations, and matrices Systems of linear equations Augmented matrices Matrix operations Properties of matrix arithmetic, and inverses Elementary matrices and finding inverses More on systems and inverses Diagonal, triangular and symmetric matrices Linear transformations Matrices and Markov models Determinants Defining the determinant Elementary operations and determinants The Adjugate and Cramer s Rule Vector spaces and bases Real Vector Spaces Subspaces Linear independence The Wronksian Bases Dimension Change of Basis Row space, column space, null space Orthogonality Norm and dot product The fundamental matrix spaces The GramSchmidt Process More on the dot product
4 4 CONTENTS 5 Eigenvectors and diagonalization Eigenvalues and eigenvectors Similar matrices and diagonalization Orthogonal diagonalization Applications Best approximations Curve fitting using least squares Eigenvalues and Markov models
5 0.1. NOTATION Notation We write to mean there exists and to mean for all. Note that the first of these is a backwards E (for exists ) and the second of these is an upsidedown A (for all ). Given a sets S and T, and an object x, we write x S to mean that x is in S, and S T to mean that every member of S is in T (i.e., that S is a subset of T ). The intersection of S and T (S T ) the set of things which are in both S and T. The union of S and T (S T ) is the set of things that are in at least one of S and T. We write S \ T for {x S : x T }.
6 6 CONTENTS
7 Chapter 1 Systems of linear equations, and matrices 1.1 Systems of linear equations A linear equation is an equation of the form a 1 x a n x n = b where a 1,..., a n and b are constants, and x 1,..., x n are variables. The following are examples of linear equations: 2x + 3y = 4 x 5y + 3z = 0. 2 The following are not linear equations (in the first case, because two variables are multiplied together, and in the second case because we have a square root of a variable): xy = 1 2 x + y = 6. A system of linear equations is a finite set of linear equations. For instance, 2x + 3y 6z = 4 x + 2y + z = 0 4x + 8z = 12 Our first goal is to be able to find the set of solutions to a given system of linear equations. First let s consider the set of solutions to a single linear equation. 2x = 4. The set of solutions is the single value x = 2. 7.
8 8 CHAPTER 1. SYSTEMS OF LINEAR EQUATIONS, AND MATRICES x + y = 3. The set of solutions is the line y = 3 x. x + y + z = 1. The set of solutions is a plane. Solution spaces in higher dimensions are harder to visualize, but in general the set of solutions to a linear equation with n variables is an (n 1)dimensional space (because you can freely choose the values of all but one of the variables). Note that the set of solutions to a linear equation falls into one of three cases : empty, infinite, or a single point. Moreover, given two solutions, everything on the line that contains them is also a solution. To see this, suppose that we have two solutions c = (c 1,..., c n ) and d = (d 1,..., d n ) to an equation a 1 x a n x n = b and suppose that s and t are any two numbers such that s + t = 1. Then is a solution, since s c + t d = (sc 1 + td 1,..., sc n + td n ) a 1 (sc 1 + td 1 ) + + a n (sc n + td n ) = (a 1 sc a n sc n ) + (a 1 td a n td n ) = s(a 1 c a n c n ) + t(a 1 d a n d n ) = sb + tb = b(s + t) = b. The solution space of a system of linear equations is just the intersection of the solution spaces of the individual equations. For systems with two variables, this solution space could be a line, a point or nothing (think of examples for all three cases; given two equations in two variables, you can tell which case you are in without doing any work). For systems with three variables, the solution space could be a plane, a line, a point or nothing (visualize examples). In general, the dimension of the solution space for a system of linear equations with n variables (total, each appearing at least once with a nonzero coefficient) can be any number from 0 to n 1 (or nonexistent, if the solution space is empty). As with single equations, the set of solutions to a system of linear equations falls into the same three cases: empty, infinite, or a single point. There are three (elementary) operations that we can perform on a system of linear equations without changing the solution space. They are: Multiplying an equation by a nonzero number. Switching the places of two equations in the list of equations. Adding a multiple of one equation to another.
9 1.1. SYSTEMS OF LINEAR EQUATIONS 9 It should be clear that the first two operations do not change the set of solutions. For the third, let us consider an example. Consider the system 2x + 3y 6z = 4 x + 2y + z = 0 4x + 8z = 12 If we add two times the first equation to the second equation, we get 2x + 3y 6z = 4 3x + 8y 11z = 8 4x + 8z = 12 After adding a multiple of one equation to another, every solution of the original system is a solution to the new system. However, this operation is reversible, so every new solution is also an old solution. To reverse the example above, add 2 times the first equation to the (new) second equation. It is not hard to see that the other two operations are also reversible, a fact that we will come back to. In many cases we can find a solution to a system of linear equations by using these elementary operations to eliminate one variable at a time. For the system above, one could do the following. 1. Add half of the first equation to the second equation, and 2 times the first equation to the third equation, to get 2x + 3y 6z = 4 (7/2)y 2z = 2 6y + 20z = 4 2. Add 12/7 times the new second equation to the new third equation to get 2x + 3y 6z = 4 (7/2)y 2z = 2 (116/7)z = 52/7 3. Solving the third equation, we get that z = 13/ Plugging z = 13/29 into the second equation and solving, we get that y = 24/ Plugging z = 13/29 and y = 24/29 into the first equation, and solving, we get that x = 61/29. The method above is not optimal (for several reasons, one being that typically we are interested in the set of all solutions, and there may be more than one). A better method involves representing systems of linear equations as matrices.....
10 10 CHAPTER 1. SYSTEMS OF LINEAR EQUATIONS, AND MATRICES 1.2 Augmented matrices When trying to find the solution space for a system of linear equations, it is convenient to represent the system as an augmented matrix. A matrix is a rectangular array of numbers. An augmented matrix is a matrix with an extra column which is separated from the rest. A system of the form is written as follows 2x + 3y 6z = 4 x + 2y + z = 0 4x + 8z = Notice that each equation in the system corresponds to a row in the matrix, and each variable corresponds to a column. It does not matter which variable corresponds to which column, or which equation corresponds to which row. We use a vertical line to separate the coefficients of the variables from the constant terms. Translated to augmented matrices, our operations become the elementary row operations: Multiplying a row by a nonzero number. Switching the places of two rows in the matrix. Adding a multiple of one row to another. We are going to use these operations to convert a given augmented matrix (or, the part of the matrix to the left of the vertical line) into one in what is called reduced row echelon form. Before defining reduced row echelon form, we make a couple of preliminary definitions. First, if a row of a matrix has a nonzero entry, the leftmost nonzero entry is called a leading entry or a pivot Definition. We say that a matrix in row echelon form if it satisfies the following conditions. For any two rows with leading entries, the leading entry of the lower row is to the right of the leading entry in the higher row. If a row consists entirely of zeros, then every row below it consists entirely of zeros. A matrix is in reduced row echelon form if it is in row echelon form and the following conditions are satisfied. Each leading entry has value 1 (such an entry is called a leading 1 ).
11 1.2. AUGMENTED MATRICES 11 Each leading 1 is the only nonzero entry in its column Example. The matrix is in reduced row echelon form. The matrix is in row echelon form, but not reduced row echelon form, since the leading 1 in the second row is not the only nonzero entry in its column. The matrix is in not in row echelon form (so also not in reduced row echelon form), since the leading entry in the third row is below (so, not to the right of) the leading entry in the second row. Before we see how to convert matrices into reduced row echelon form using elementary row operations, let us see how to find the solution space of a system of linear equations represented by an augmented matrix whose left side is in reduced row echelon form. Suppose that we have the matrix This represents the following system of equations (the allzero row can be ignored, since it corresponds to an equation satisfied by all choices of x 1,..., x n ). x 1 + 2x 4 = 1 x 3 + 3x 4 = 2 For each equation, solve for the variable corresponding to the leading 1 (i.e., the leftmost variable). For each variable x i not corresponding to a leading 1, add the equation x i = x i (this corresponds to the fact that these variables can be chosen freely, and the values of the variables corresponding to leading 1 s depend on these choices). We get the following system. x 1 = 1 2x 4 x 2 = x 2 x 3 = 2 3x 4 x 4 = x 4
12 12 CHAPTER 1. SYSTEMS OF LINEAR EQUATIONS, AND MATRICES which we can rewrite as x x 2 x 3 = x x x We call this the vector form of the solution space. Often we replace the two variables on the right with new variables, and write the equation as x x 2 x 3 = s t x 4 The point is that letting x 2 and x 4 (or s and t) range over all possible values, we generate all the solutions to our system. We call these the free variables. The number of free variables is the dimension of the solution space. Some notes: The reduced row echelon form of a matrix is unique. That is, you can t get from one matrix in reduced row echelon form to another using elementary row operations. (This is not true for row echelon form.) If an augmented matrix (whether in reduced row echelon form or not) contains a row whose leading entry is in the rightmost column, then this row corresponds to an unsolvable equation. If the matrix is in reduced row echelon form and there is no such row, then the system has at least one solution. If a matrix in reduced row echelon form represents a solvable system, the dimension of the solution space is the number of columns from the left side of the matrix which do not have leading entries. This is the same as the number of columns on the left side of the augmented matrix minus the number of rows which are not all zeros. A matrix where every column on the left side has a leading 1 corresponds to a system with a unique solution (or no solution). A system with more variables than equations has infinitely many solutions if it has any solutions. Here is an algorithm (called GaussJordan elimination) for converting a matrix into reduced row echelon form using elementary row operations. The algorithm consists of repeating the following steps; we will call the ith repetition Round i (starting with Round 1). We let n be the number of rows of the matrix. 1. In Round i, if all the entries from row i to row n (inclusive) have value 0, then stop. Otherwise, find the leftmost column with a nonzero entry in rows i to n, and switch rows so that row i has a nonzero entry in this column.
13 1.2. AUGMENTED MATRICES Multiply the new row i by a constant to make its leading entry a Add a multiple of row i to each other row, in order to make the leading 1 in row i the only nonzero entry in its column. 4. If i = n then stop, otherwise continue to Round i + 1. A few observations about this algorithm: You don t have to cofine yourself to these steps. If you see a way to simplify the matrix with an elementary operation, go ahead. You can t make an irreversible mistake. You can add one row to several other rows at the same time, but other combinations of operations can cause problems. For instance, don t add two rows to each other at the same time Example. Suppose that we want to find the set of solutions for the following system. x 2 + 2x 3 = 4 2x 1 + x 2 = 6 2x 1 + 2x 2 + 2x 3 = 10. We represent this as follows Switch the first two rows to get a nonzero entry in the first column of the first row, and then divide that row by 2. We get: 1 1/ Add 2 times the first row to the third row to make the leading 1 in the first row the only nonzero entry in its column. We get: 1 1/ The leftmost nonzero entry in the rest of the matrix (below the first row) is already in the second row, and it is already 1. So we add 1/2 times the second row to the first row, and 1 times the second row to the third row, to get
14 14 CHAPTER 1. SYSTEMS OF LINEAR EQUATIONS, AND MATRICES The matrix is now in reduced row echelon form. The corresponding solution space is: x x 2 = 4 + s We can check our answer by plugging in values for s. x 3 If the rightmost column of an augmented matrix is all zeros (i.e., if the constants in the system of linear equations it represents are all 0), then the corresponding system is said to be homogeneous. A homogeneous system has at least one solution : let all the variables be 0. This is called the trivial solution. The solution set for a homogeneous system is then either infinite, or only the trivial solution. If there are more variables than equations in a homogeneous system, then there are infinitely many solutions. Note that the first term on the right side of the equation above (the constant term) is a solution to the given system, and that the other term (the vector being multiplied by s) is a solution to the corresponding homogeneous system. This will always be the case : when the solution space to a system of linear equations is written in vector form, the terms multiplied by the free variables describe the solution space to the corresponding homogeneous system. 1.3 Matrix operations A matrix with n rows and m colums is said to be an n m matrix. Here is a 2 3 matrix. [ ] A 1 m matrix is called a row matrix. An n 1 matrix is called a column matrix. An n n matrix is called a square matrix. Given a matrix A, we write (A) ij for the entry in the ith row and the jth column. Letting A be the matrix above, we have (A) 13 = 1. Sometimes we write a ij instead of (A) ij. If A is an n n matrix, the diagonal of A is the set of entries of the form (A) ii for i = 1,..., n. The trace of a square matrix is the sum of the terms along the diagonal. If A and B are n m matrices (i.e., with the same dimensions), we can add them (if not, we can t). The matrix A + B is the n m matrix such that (A + B) i,j = (A) ij + (B) ij. For example, = Given a matrix A and a number c (called a scalar in this context), the matrix ca has the same dimensions as A, and (ca) ij = c(a) ij for all relevant i, j. For
15 1.3. MATRIX OPERATIONS 15 example, ( 2) = Finally, we can multiply matrices. As with addition, we can t multiply any two matrices. For the matrix product AB to be defined the number of columns of A has to be the same as the number of rows of B. That is, for some n, m, p, A has to be n m and B has to be m p. The resulting matrix AB is n p, and (AB) ik is m (A) ij (B) jk. Here is an example: [ ] j= = [ Remark. The main problem in the previous section, finding the solution set for a system of linear equations, can be rephrased as finding the set of solutions to an equation of the form AX = B, where A is a matrix (n m, say), X is a (m 1) column vector whose entries are variables, and B is a (n 1) column vector. Suppose instead that B has more than one column (say, p many). We write B = [ b 1 b p ] to indicate that b i is the ith column of B. An m p matrix X is a solution to the equation AX = B if and only if, for each i, the ith column of X is a solution to the equation Ax = b i. So we can find the set of such X one column at a time. In fact we can solve all p of these problems at the same time by writing the augmented matrix [ A B ] with A on the left and B on the right. We then convert the left side of the matrix to reduced row echelon form via elementary row operations as before. Order matters for matrix multiplication. If a matrix product AB is defined, then product BA may not even be defined. Even if it is defined, it may not be the same (although it can be). Consider: [ ] [ ] = [ ] ]. and [ ] [ ] [ 2 2 = 4 2 ].
16 16 CHAPTER 1. SYSTEMS OF LINEAR EQUATIONS, AND MATRICES Let A 1,..., A k be n m matrices. A linear combination of A 1,..., A k is a matrix of the form c 1 A c k A k, where c 1,..., c k are scalars. The following observation (and generalizations of it) will be useful for deriving theoretical results later. Suppose that A is an n m matrix and B is an m p matrix. Let C 1,..., C m be the columns of A and let R 1,..., R m be the row of B. We can write this as A = [ C 1 C m ] and B = For each i {1,..., p}, the ith column of AB is the linear combination R 1. R m (B) 1i C (B) mi C m. Similarly, for each i = 1,..., n, the ith row of AB is the linear combination In addition AB is equal to (A) i1 R (A) im R m. C 1 R C m R m. This way of writing the product is called its columnrow expansion Example. Let A and B be the matrices and respectively. Then AB = the first column of AB is equal to ;
17 1.3. MATRIX OPERATIONS 17 the second row of AB is equal to 0 [ ] + 1 [ ] + 3 [ ] ; the column row expansion of AB is [ ] [ ] [ ], which is The transpose of an n m matrix A is the m n matrix A T with the property that (A T ) ji = (A) ij. For instance, [ ] T = Note that if a matrix product AB is defined, then B T A T is also defined. Theorem Each of the following formulas is valid for matrices of the appropriate dimensions. (A T ) T = A (A + B) T = A T + B T (A B) T = A T B T (ka) T = ka T (AB) T = B T A T To see that the last formula from Theorem holds, let A and B be matrices of respective dimensions m n and n p, and let (i, j) be a pair in {1,..., m} {1,..., p}. Then n ((AB) T ) ji = (AB) ij = ik (B) kj = k=1(a) (B T ) jk (A T ) ki = (B T A T ) ji. k=1
18 18 CHAPTER 1. SYSTEMS OF LINEAR EQUATIONS, AND MATRICES 1.4 Properties of matrix arithmetic, and inverses The following theorem lists some of the basic properties of the operations of matrix addition and multiplication, and scalar multiplication of matrices. These rules are all the same as for ordinary arithmetic with real numbers. Note in particular the associative law for matrix multiplication : (AB)C = A(BC). Theorem Each of the following formulas is valid for matrices of the appropriate dimensions (i.e., when the given operations are defined). A + B = B + A (matrix addition is commutative) A + (B + C) = (A + B) + C (matrix addition is associative) A(BC) = (AB)C (matrix multiplication is associative) A(B + C) = AB + AC ( left distributivity) (B + C)A = BA + CA ( right distributivity) a(b + C) = ab + ac (a + b)c = ac + bc a(bc) = (ab)c a(bc) = (ab)c = B(aC) Example. If A and B are square matrices of the same dimension, then (A + B) 2 = A 2 + AB + BA + B 2. Note however, that this is not necessarily equal to A 2 + 2AB + B 2. The following rules follow from the ones in Theorem 1.4.1, using the equation A B = A + ( B). A(B C) = AB AC (B C)A = BA CA a(b C) = ab ac (a b)c = ac bc If a, b and c are real numbers, with a = 0, and ab = ac, then b = c. This is called the cancellation law. This law does not hold for matrix multiplication (on either side). Consider: [ ] [ ] = [ ]
19 1.4. PROPERTIES OF MATRIX ARITHMETIC, AND INVERSES 19 and [ ] [ ] = [ A zero matrix is a matrix whose entries are all 0. We write 0 m,n for the m n zero matrix, or 0 when we want to suppress the dimensions (note the difference with the scalar 0. The following theorem lists properties of matrix arithmetic with zero matrices or with the scalar 0. Again, there are no surprises here. Theorem If c is a scalar then the following hold for matrices of the appropriate dimensions: A + 0 = 0 + A = A A 0 = A A A = A + ( A) = 0 0A = 0 If ca = 0, then c = 0 or A = 0. There is another difference with the arithmetic of real numbers, however. If a and b are two nonzero real numbers, then ab 0 (we say : there are no zero divisors). However, two nonzero matrices can multiply to give a zero matrix: [ ] [ ] = [ These two examples highlight the same issue, since for matrices A, B and C, AC = BC if and only if (A B)C = 0. They are both related to the possible nonexistence of matrix inverses, as defined below Definition. A square matrix whose entries are 1 along the diagonal and 0 everywhere else is called an identity matrix. We let I n denote the n n identity matrix, and write I for an identity matrix of unspecified dimension. The matrix I n is a multiplicative identity : if A and n m matrix and B is an m n matrix, then I n A = A and BI n = B. For the real numbers, the multiplicative identity is the number 1. The real numbers have the following property : for each nonzero real number a, there is a real number b (= 1/a) such that ab = 1. In this case we say that b is the multiplicative inverse of a. However, not every matrix can be multiplied by another matrix to get a value of I n. In fact the equation AB = I n implies that A has at least n columns (as can been seen by later material on spanning sets, see Remark ), so the equation AB = BA = I can hold only for square matrices, and even then not for every nonzero square matrix. An n n matrix A is said to be invertible if there is a matrix B such that AB = BA = I n (note that such a B would have to also be n n). If there is ]. ].
20 20 CHAPTER 1. SYSTEMS OF LINEAR EQUATIONS, AND MATRICES such a B, then there is only one (since if AB = BA = I n and AC = CA = I n then C = CI n = CAB = I n B = B), and we call it the inverse of A and write it as A 1 (this generalizes notation from the real numbers, but should not be taken to mean 1/A). If there is no such B, then A is said to be singular. The two following matrices are inverses: Some notes: [ ] [ 3 2, 4 3 The cancellation law (AB = AC B = C) holds when A is an invertible matrix. If A is invertible and B is not a zero matrix, then AB and BA (if defined) are both not zero matrices (i.e., invertible matrices cannot be zero divisors). A matrix with allzero row or column must be singular. For instance, if the first column of A is all zeros, and every entry of B which is not in its first row is 0, then AB is a zero matrix (consider the columnrow expansion of AB). A matrix A with two identical rows is singular, since for any B, AB will have two identical rows (so will not be the identity matrix). Similarly, a matrix with two identical columns is singular. One can also show in this way that a matrix is not invertible if one row is a constant multiple of another, or if one column is a constant multiple of another. If we know the inverse of a matrix A, then we can solve the equation A x = b by multiplying both sides on the left by A 1. If A 1 exists, then there is a unique solution (the reverse also holds). The following fundamental fact gives a formula for the inverse of a product of invertible matrices. Note that since matrix multiplication is not commutative, the inverse of AB is not (in general) equal to A 1 B 1. Theorem Let A and B be invertible matrices such that the product AB is defined. Then AB is invertible, and (AB) 1 = B 1 A 1. Proof. Let n be such that A and be are both n n. Then A 1 and B 1 are also both n n, so the product B 1 A 1 is defined. Moreover, and (AB)(B 1 A 1 ) = ABB 1 A 1 = A(BB 1 )A 1 = AIA 1 = AA 1 = I (B 1 A 1 )(AB) = B 1 A 1 AB = B 1 (A 1 A)B = B 1 IB = B 1 B = I, so the matrix B 1 A 1 satisfies the definition of being the inverse of AB. ].
21 1.4. PROPERTIES OF MATRIX ARITHMETIC, AND INVERSES 21 The following theorem contains some basic facts about how inverses interact with powers of matrices, transposes and scalar multiples. The last of these follows from the formula (AB) T = B T A T. Theorem If a matrix A is invertible, then (A 1 ) 1 = A; (A n ) 1 = (A 1 ) n, which we write as A n ; if k 0, then (ka) 1 = k 1 A 1 ; (A T ) 1 = (A 1 ) T. There is a simple formula for the inverse of a 2 2 matrix: [ ] 1 [ a b 1 d b = c d ad bc c a Note that the expression is defined if and only if ad bc 0, which is exactly when the inverse exists. ] Example. The inverse of the matrix [ 1 3 A = 2 4 ] is The transpose of A is and the inverse of A T is [ 4 3 ( 1/10) 2 1 [ A T 1 2 = 3 4 [ 4 2 ( 1/10) 3 1 ]. ], ], which is the same as the transpose of A 1. The matrix [ ] is not invertible, since (1)( 4) (2)( 2) = 0. Remark shows how to find a solution X to the matrix product equation AX = B, if a solution exists. Since the inverse of A is a solution the equation AX = I, this shows how to find the inverse of A, if it exists. We discuss this method at the end of the next section.
22 22 CHAPTER 1. SYSTEMS OF LINEAR EQUATIONS, AND MATRICES 1.5 Elementary matrices and finding inverses Recall our three elementary row operations (slightly restated): Multiplying some row i by a nonzero number c. Switching the places of two rows i and i. Adding k times one row i to another row i. Recall also that each of these operations can be undone by an operation of the same type: Multiplying row i by 1/c. Switching the places of the same two rows i and i. Adding k times row i to row i. For each elementary row operation and each positive integer n there is a matrix E such that for any matrix A with n rows, the matrix product EA is the result of applying the given operation to A. In fact, the matrix E is the result of applying the given row operation to I n (since EI n = E). The following alternate definitions are more explicit: For multiplying row i by c, the identity matrix I n with the 1 on row i replaced by c. For switching rows i and i, the identity matrix I n with rows i and i switched. For adding k times row i to row i, the identity matrix I n with value k inserted in row i and column i. These are called the elementary matrices. Each of these matrices in invertible, and its inverse is an elementary matrix of the same type Example. Here are some elementary matrices and their inverses. For the operation of multiplying the third row of a 4 n matrix by 2, the elementary matrix is The matrix for the inverse of this operation is /
23 1.5. ELEMENTARY MATRICES AND FINDING INVERSES 23 The matrix for switching the second and fifth rows of a 5 n matrix is The matrix for the inverse operation is the same. For the operation of adding 2 times the second row of a 3 n matrix to the first row, the elementary matrix is The matrix for the inverse of this operation is The key fact about elementary matrices is the following Remark. If A is a matrix, and B is the result of applying a sequence of elementary operations to A (in particular, if B is the reduced row echelon form of A), then B = E n... E 1 A, where each E i is the elementary matrix corresponding to the ith operation. The following theorem lists several properties which are equivalent to invertibility. The implication from (4) to (5) uses Remark The list will be extended significantly (see Theorems and 4.2.8, for instance). Theorem The following statements are equivalent, for each n n matrix A. 1. A is invertible. 2. There is an n n matrix B such that BA = I. 3. The only solution to the equation A x = 0 is the trivial solution. 4. The reduced row echelon form of A is I n. 5. A is equal to a product of elementary matrices. Proof. (1) (2). This follows immediately from the definition of invertibility. (2) (3). If A is invertible, and Ac = 0, then BAc = B0, which means that c = 0.
24 24 CHAPTER 1. SYSTEMS OF LINEAR EQUATIONS, AND MATRICES (3) (4). Since A is a square matrix, if its reduced row echelon form is not I n, then it has a column without a leading 1, which means that A x = 0 n,1 has infinitely many solutions. (The same analysis shows that (4) (3).) (4) (5). Each step of the process of putting A in reduced row echelon form consists of applying an elementary row operation, which has the same effect as multiplying on the left by an elementary matrix. So if the reduced row echelon form of A is I n, then I n = E k E k 1 E 1 A, where E i is the elementary matrix corresponding the ith operation applied. The inverse of E k E k 1 E 1 is E 1 1 E 1 2 E 1 k. Multiplying both sides by this product, we get that A is 2 E 1 k, which is an elementary matrix. (5) (1). If A = E 1 E k, and each E i is invertible, then A 1 = E 1 k E1 1. E 1 1 E Remark. The equivalence of the first two items in Theorem shows that (for square matrices) if AB = I then BA is also I. To see this, note that AB = I implies that item (2) holds for B, which by item (1) implies that B 1 exists. Multiplying both sides of the equation AB = I on the left by B 1 gives that ABB 1 = IB 1, i.e., that A = B 1. Remark showed how, given a square matrix A, to find a solution the equation AX = I, if one exists. By Remark 1.5.4, such a solution would be the inverse of A. The proof of Theorem shows that this method for finding A 1 also finds a decomposition of A 1 as a product of elementary matrices. Suppose that E 1,..., E k is a sequence of elementary matrices such that E k E 1 A = I n. Then A 1 = E k E 1. This shows that we can find A 1 by applying the same operations to I n that we applied to A Example. Let A be the matrix To find A 1, write an augmented matrix with A on the left and the identity matrix I on the right We use GaussJordan elimination to convert the matrix on the left to reduced row echelon form, applying the same operations to the matrix on the right. Our first operation is to add row 1 to row 3. This gives
25 1.5. ELEMENTARY MATRICES AND FINDING INVERSES 25 Note that, letting E 1 be the elementary matrix corresponding to the operation of adding row 1 to row 3, that the matrix above on the left side is E 1 A and the matrix on the right is E 1 Now we add 2 times row 2 to row 3 to get Letting E 2 be the matrix corresponding to adding 2 times row 2 to row 3, the matrix above on the left is E 2 E 1 A, and the matrix on the right is E 2 E 1. We let E 3, E 4 and E 5 be the elementary matrices corresponding to the remaining three steps below. Multiply row 3 by 1/4 to get /4 1/2 1/4 Now add 2 times row 3 to row 1 and 1 times row 3 to row 2: /2 1 1/ /4 1/2 1/ /4 1/2 1/4 The right side of this augmented matrix is the inverse of our given matrix. We can conveniently rewrite it as (1/4) Notice that our inverse is equal to the following product of elementary matrices, corresponding to the operations we applied, i.e., A 1 = E 5 E 4 E 3 E 2 E 1, which is / Furthermore, A is the product of the inverses of these matrices, in the reverse order., i.e A = E1 1 2 E 1 3 E 1 4 E 1 5 = Notice what happens if our given matrix is not invertible : we get an all0 row on the left side, since in this case not every column on the left side will contain a leading 1.
26 26 CHAPTER 1. SYSTEMS OF LINEAR EQUATIONS, AND MATRICES 1.6 More on systems and inverses We restate Theorem with two more equivalent conditions. Theorem The following statements are equivalent, for each n n matrix A. 1. A is invertible. 2. There is an n n matrix B such that BA = I. 3. The only solution to the equation A x = 0 is the trivial solution. 4. The reduced row echelon form of A is I n. 5. A is equal to a product of elementary matrices. 6. For every n 1 matrix B, the equation A x = B has a solution (i.e., is consistent). 7. For every n 1 matrix B, the equation A x = B has exactly one solution. Proof. We know from Theorem that conditions (1)(5) are equivalent. Condition (7) clearly implies both conditions (3) and (6). We have seen already that condition (1) implies condition (7) : if A x = B then x = A 1 B. We can also see directly that (3) implies (7) : if C and D are distinct solutions to A x = B, then C D 0 but A(C D) = AC AD = B B = 0. Finally, to see that (6) implies (1), let e n,i be the ith column of I n, i.e., the n 1 matrix whose only nonzero entry is a 1 in the ith row. Then (6) implies that for each i, the equation A x = e n,i has a solution; call it C i. Let B be the n n matrix whose ith column is C i, i.e., B = [ C 1 C n ]. Then AB = I, so B is invertible by the equivalence of (1) and (2), and B = A 1 by Remark Corollary If A and B are square matrices, and AB is invertible, then A and B are both invertible. Proof. First, note that if B is not invertible, then by condition (2) of Theorem 4.2.8, there is a nonzero column matrix C such that BC = 0. Then ABC = 0, which would show that AB is not invertible. So B 1 exists. Then ABB 1 = A. Since AB and B 1 are invertible, A is a product of invertible matrices, which means that A is invertible (recall that (CD) 1 = D 1 C 1, i.e., a product of invertible matrices is invertible). Here is a typical problem in the theory of matrices (which we will come back to in Section 3.2): given an A be an n m matrix, find the set of n 1 matrices B for which there exists a m 1 matrix X such that AX = B (we say: for which B is the equation AX = B consistent?). This problem can be
27 1.6. MORE ON SYSTEMS AND INVERSES 27 equivalently formulated in terms of systems of equations. Looking ahead, the matrix A induces a function X AX, and this problem asks for the range of this function. We can answer this question by writing a partitioned matrix with A on the left and a column of variables b 1,..., b n on the right, and putting the left side of the matrix into reduced row echelon form Example. Determine the conditions on the variables b i that make the following linear system consistent. x 1 2x 2 + 5x 3 = b 1 4x 1 5x 2 + 8x 3 = b 2 3x 1 + 3x 2 3x 3 = b 3 To do this, we write the following augmented matrix b b b 3 Putting the left side of this matrix into reduced row echelon form as usual, we get the following ( 5/3)b 1 + (2/3)b ( 4/3)b 1 + (1/3)b b 1 + b 2 + b 3 The given system is then solvable if and only if b 1 + b 2 + b 3 = 0 (recall that a system of linear equations has a solution if and only if, when the corresponding augmented matrix is put into reduced row echelon form, there is no all0 row on the left paired with a nonzero value on the right). Note that this corresponds to the fact that, for the left side of our augmented matrix, 1 times the first row, plus the second and third rows, gives a row of zeros (i.e., each column of the left side of our augmented matrix satisfies the equation b 1 + b 2 + b 3 = 0). The following example gives a geometric interpretation to problems of the same type Example. Let A be the following matrix: 1 1 A = Let X be a matrix of the form [ x1 x 2 ], where x 1 and x 2 are considered as variables. Then AX is equal to x 1 x 2 2x 1 + 3x 2 x 1 + 2x 2,
28 28 CHAPTER 1. SYSTEMS OF LINEAR EQUATIONS, AND MATRICES which can also be written as x x , i.e., x 1 times the first column of A plus x 2 times the second. This then describes the set of outputs (i.e., the range) of the function X AX. Another way to describe the same set is to see it as the set of 3 1 matrices B such that the equation AX = B has a solution. To find this, we think of B as b 1 b 2 b 3 and write the augmented matrix 1 1 b b b 3 Row reducing the left side of this matrix, we obtain 1 0 2b 1 + b b 1 + b b 1 + b 2 5b 3 It follows that the equation AX = B has a solution exactly when 7b 1 + b 2 5b 3 = Diagonal, triangular and symmetric matrices Definition. A square matrix A is diagonal if (A) ij = 0 whenever i j; upper triangular if (A) ij = 0 whenever i > j; lower triangular if (A) ij = 0 whenever i < j; triangular if it is either upper triangular or lower triangular; symmetric if (A) ij = (A) ji for all i, j Example. Here are some examples of matrices of the types just introduced.
29 1.7. DIAGONAL, TRIANGULAR AND SYMMETRIC MATRICES 29 is diagonal (and upper triangular, lower tri The matrix angular and symmetric). The matrix triangular or symmetric The matrix triangular or symmetric The matrix is symmetric, but not diagonal, upper triangular or lower triangular. Some observations: Diagonal matrices are symmetric. is upper triangular, but not diagonal, lower is lower triangular, but not diagonal, upper A matrix is diagonal if and only if it is both upper triangular and lower triangular. If a matrix A is upper triangular, then A T is lower triangular. If a matrix A is lower triangular, then A T is upper triangular. A matrix A is symmetric if and only if A = A T. With one exception, the properties of being diagonal, upper triangular, lower triangular and symmetric are all preserved under sums, products and scalar multiples (the one exception being : a product of two symmetric matrices need not be symmetric). That is, if A and B are diagonal (upper triangular, etc.), then so are A + B, AB and ka for any scalar k (again, with this one exception). These facts can all be checked by using the Σnotional definitions. For instance, to see that the product of upper triangle matrices is upper triangular, let A and B be upper triangular n n matrices, and fix i > j. Then (AB) ij = n (A) ik (B) kj. k=1 Since i > j, for each value of k, either i > k or k > j, so all the terms in this sum are 0. Essentially the same argument shows the following.
30 30 CHAPTER 1. SYSTEMS OF LINEAR EQUATIONS, AND MATRICES Theorem If A and B are n n matrices which are both diagonal, or both upper triangular or both lower triangular, then (AB) ii = (A) ii (B) ii for all i {1,..., n} Example. Let and let Then A = B = AB = Note that the diagonal terms of AB are the products of the diagonal terms of A and B respectively. Some observations on diagonal matrices. A diagonal matrix is invertible if and only if its diagonal entries are all nonzero. Its inverse in this case is the diagonal matrix whose diagonal entries are the reciprocals of the diagonal entries of the original matrix. More generally, if A is a diagonal matrix and k is an integer, A k (if k 0 or A is invertible) is the diagonal matrix such that (A k ) ii = (A) k ii for all i. If A is an n n diagonal matrix, and B is an n m matrix, then AB is the result of multiplying, whenever 1 i n, the ith row of B by (A) ii. Similarly, then BA is the result of multiplying, whenever 1 i n, the ith column of B by (A) ii. The following facts about triangular matrices can be verified by considering the process of putting the given matrix in reduced row echelon form (for the first three) or by considering the definition of matrix multiplication (for the last two). A triangular matrix is invertible if and only if all if its diagonal entries are nonzero. The inverse of an invertible lower triangular matrix is lower triangular. The inverse of an invertible upper triangular matrix is upper triangular. If A and B are upper triangular, then (AB) ii = (A) ii (B) ii for all i. If A and B are lower triangular, then (AB) ii = (A) ii (B) ii for all i.
31 1.8. LINEAR TRANSFORMATIONS 31 The four following facts about symmetric matrices can be easily verified. For any matrix A, AA T and A T A are symmetric. This follows from the fact that, for any two matrices A and B, (AB) T = B T A T. If A and B are symmetric (i.e., A T = A and B T = B), then AB is symmetric (ie., (AB) T = AB) if and only if A and B commute (i.e., AB = BA). This follows from the fact that (AB) T = B T A T. If A is invertible and symmetric, then A 1 is symmetric. from the fact that (A 1 ) T = (A T ) 1. This follows If A is invertible, then so are AA T and A T A (note that these are symmetric). This again follows from the fact that (A 1 ) T = (A T ) Linear transformations Letting n be a positive integer, R n is the set of ntuples of real numbers. So R is essentially the real line, R 2 the Cartesian plane and R 3 is threedimensional space. We write a typical point of R n in commadelimited form as (a 1,..., a n ). We think of the members of R n interchangeably as points and as vectors (one might say that the members of R n are tuples, which can be used to represent both points and vectors). Sometimes we represent a point in column vector form as a 1. a n A function from R n to R m is as association of a point in R m to each point in R n (more formally, it is a set f consisting of pairs (a, b) such that a R n and b R m, such that for each a R n there is a unique b in R m with (a, b) f). The set R n is called the domain of f, and R m is called the codomain. The range of f is the set of b R m for which there is an a R n such that f(a) = b. Some examples of functions: f : R 2 R 3, defined by f(x, y) = (x 2, xy, x + y); f : R 3 R, defined by f(x, y, z) = x 2 + y 2 + z 2 ; f : R n R n, defined by f(x 1,..., x n ) = (x 1,..., x n ) (this is called the identity function on R n ); f : R 2 R 2, defined by f(x, y) = (0, 1) (this is an example of a constant function); f : R 2 R 2, defined by f(x, y) = (0, 0) (this is an example of a zero transformation, which is a type of constant function); f : R 3 R 2, defined by f(x, y, z) = (2x y + z, x 2z);
32 32 CHAPTER 1. SYSTEMS OF LINEAR EQUATIONS, AND MATRICES for any m n matrix A, the function T A : R n R m defined by setting T A (x 1,..., x n ) to be A x 1. x n (this is an example of a matrix transformation) Definition. A linear transformation from R n to R m is a function f : R n R m satisfying the following rules: for all x 1,..., x n and all scalars k, f(kx 1,..., kx n ) = k f(x 1,..., x n ); for all x 1,..., x n, y 1,..., y n, f(x 1 + y 1,..., x n + y n ) = f(x 1,..., x n ) + f(y 1,..., y n ). The first, second and fourth functions listed above do not satisfy these rules (find counterexamples!). The other four do, however. Two observations about linear transformations: a linear transformation must send the zero vector in R n to the zero vector in R m (to see this, apply the first rule above to k = 0); a function of the form f(x 1,..., x n ) = (g 1 (x 1,..., x n ),..., g m (x 1,..., x n )) is a linear transformation if and only if each g i is a linear transformation. Recall that a linear combination of a collection of matrices A 1,..., A n is a matrix of the form c 1 A c n A n, where each c i is a scalar (i.e., a real number). The two conditions in the definition of linear function amount to saying that a linear function is one that preserves linear combinations, in the sense that if u 1,..., u n are elements of the domain of f (i.e, R n ), and k 1,..., k n are scalars, then f(k 1 u k n u n ) = k 1 f(u 1 ) + k n f(u n ). The standard basis vectors for R n are the vectors e 1,..., e n in R n, where each e i has value 1 in the ith place and 0 everywhere else. Each column vector in R n is a linear combination of the standard basis vectors, i.e., (x 1,..., x n ) = x 1 e x n e n. Theorem below lists three key facts about linear transformations and the standard basis vectors. The first fact, which puts together the two facts we have just stated, says that a linear transformation f is determined by the values it takes on the standard basis vectors, and the assumption that f is linear. The third part of the theorem, which follows from the first two (and the fact that T A is linear), says that every linear transformation is a matrix transformation via its standard matrix, which we now define.
33 1.8. LINEAR TRANSFORMATIONS Definition. The standard matrix of a linear transformation f : R n R m is the m n matrix whose ith column (for each i {1,..., n}) is f(e i ). To see the first part of Theorem 1.8.3, recall that (x 1,..., x n ) = x 1 e x n e n and use the fact that f is a linear transformation. For the second, note that for any matrix A, Ae i (when A and e i have the appropriate dimensions for this product to be defined) is the ith column of A. Theorem Let f : R n R m be a linear transformation, for some positive integers n and m and let A be the standard matrix for f. Then the following hold. For each (x 1,..., x n ) R n, For each e i, f(e i ) = T A (e i ). f(x 1,..., x n ) = x 1 f(e 1 ) + + x n f(e n ). For each (x 1,..., x n ) R n, f(x 1,..., x n ) = T A (x 1,..., x n ). One consequence of Theorem is that a function of the form f(x 1,..., x n ) = (g 1 (x 1,..., x n ),..., g m (x 1,..., x n )) is a linear transformation if and only if each g i is a function of the form a i 1x a i nx n, for some constants a i 1,..., a i n (i.e, each g i looks like the left side of a linear equation). In this case [ a i 1 a i n] is the ith row of the standard matrix for A Example. The standard matrix for the linear transformation f(x, y, z) = (x y + z, x 2z) is [ Example. Let T be a linear transformation from R 3 to R 3 such that Then T (2, 4, 2) = (2, 0, 2); T (1, 2, 0) = ( 1, 1, 1); T (0, 2, 3) = (2, 1, 0). ].
34 34 CHAPTER 1. SYSTEMS OF LINEAR EQUATIONS, AND MATRICES T (1, 2, 1) = (1/2)T (2, 4, 2) = (1, 0, 1); T (0, 0, 1) = T (1, 2, 1) T (1, 2, 0) = (1, 0, 1) ( 1, 1, 1) = (2, 1, 2); T (0, 2, 0) = T (0, 2, 3) (3)T (0, 0, 1) = (2, 1, 0) (6, 3, 6) = ( 4, 4, 6); T (0, 1, 0) = (1/2)T (0, 2, 0) = ( 2, 2, 3); T (1, 0, 0) = T (1, 2, 0) T (0, 2, 0) = ( 1, 1, 1) ( 4, 4, 6) = (5, 3, 5). The standard matrix for T is then Remark. In the reverse direction, given positive integers n and m there is, for each ntuple of vectors v 1,..., v n from R m, a unique linear transformation sending each e i to the corresponding v i (T A, for A the matrix whose columns are v 1,..., v n ). If A is the 2 2 matrix [ a b c d then T A is the linear function which sends (1, 0) to (a, c) and (0, 1) to (b, d). Recall that this function is invertible if and only if it is injective if and only if it is onto, and that all of these conditions are equivalent to the condition ad bc 0, which is equivalent to the condition that the points (a, c) and (b, d) are not on the same line through the origin. These observations generalize to higher dimensions Example. Here are some examples of matrix transformations from R 2 to R 2 (note that the first three correspond to elementary row operations): ], Let A = [ a Then T A (x, y) = (ax, y). This function expands R 2 by a factor of a in the x direction. Note that a can be positive or negative. A negative a corresponds to flipping R 2 over the yaxis. Let A = [ Then T A (x, y) = (y, x). This function flips R 2 over the line y = x. Let A = [ Then T A (x, y) = (x + y, y). ]. ]. ].
35 1.8. LINEAR TRANSFORMATIONS 35 Let θ be a real number and let A = [ cos θ sin θ sin θ cos θ Then T A (x, y) = (x cos θ y sin θ, x sin θ + y cos θ). This linear transformation corresponds to a counterclockwise rotation by θ radians. Given (a 0, a 1 ) R 2, the function ]. f(x, y) = a 0x + a 1 y (a a a 2 0, a 1 ) 1 is the projection map to the line defined by the equation a 1 x = a 0 y. Two examples of matrix transformations from R 3 to R 3, where θ is a fixed real number: Let A = cos θ sin θ 0 sin θ cos θ Then T A (x, y, z) = (x cos θ y sin θ, x sin θ + y cos θ, z). This linear transformation corresponds to a rotation around the zaxis by θ degrees counterclockwise (as viewed from above). Let B = cos θ sin θ 0 sin θ cos θ Then T B (x, y, z) = (x, y cos θ z sin θ, y sin θ + z cos θ). This linear transformation corresponds to a rotation around the xaxis by θ degrees counterclockwise (as viewed from the positive xaxis). Given functions f : R n R m and g : R m R p, where the dimension of the codomain of f matches the dimension of the domain of g, we can form the composition g f : R n R p by setting to be (g f)(u) = g(f(u)). Then the value of (g f)(u) is the result of first applying f to u and then applying g to the result f(u). If f and g are linear transformations, then there exist matrices A and B such that A is m n and f is T A, and B is p m and g is T B. Then for all u R n, (g f)(u) = g(f(u)) = g(au) = B(Au) = BAu. That is, (g f) is a linear transformation and BA (which is a welldefined matrix product by the dimensions of A and B) is its standard matrix.