A Brief Outline of Math 355

Save this PDF as:

Size: px
Start display at page:

Download "A Brief Outline of Math 355"


1 A Brief Outline of Math 355 Lecture 1 The geometry of linear equations; elimination with matrices A system of m linear equations with n unknowns can be thought of geometrically as m hyperplanes intersecting in R n Some basic questions in this course are: i Are there any points in R n where all the hyperplanes intersect? ii How many such points are there? Geometrically, we saw that in R 2, two lines can intersect either in a single point, everywhere (ie they are the same line), or nowhere (ie they are parallel) We also emphasized the column picture, where we look for a solution as a linear combination of the columns of our matrix The columns are viewed as vectors in R n In order to (attempt to) solve a system of linear equations, we convert the equations to a matrix, then use row operations to reduce a matrix A to an upper triangular matrix U, using which it is easy to backsubstitute and find any solutions Allowable row operations are: i Add a multiple of one row to another ii Exchange rows iii Multiply a row by a nonzero number Lecture 2 Multiplication and inverse matrices; Matrix multiplication (ie AB = C) can be thought of in four ways: 1 One entry at a time: The entry c i,j is the inner product of the ith row of A with the jth column of B 2 A row at a time: The ith row of C is a linear combination of the rows of B, with the coefficients of the linear combination being the ith row of A 3 A column at a time: The ith column of C is a linear combination of the columns of A, with the coefficients of the linear combination being the ith column of B 4 A whole matrix at a time: Multiply a column of A with a row of B to get an m n matrix Add up all such matrices to get AB Given a square, invertible matrix, take the augmented matrix [ A I ], and row reduce A to reduced row echelon form (which will be the identity matrix, I), to get [ I A 1 ] 1

2 Lecture 3 Factorization into A = LU; transposes, permutations, spaces Using the One row at a time idea of multiplication, we could translate row operations into matrix algebra For example, if we were row reducing a 3x3 matrix and wanted to subtract 2 of row 1 from row 2, the picture might look something like: = E 2,1 A = We prefer to reduce a matrix to the form A = LU, since the inverse of the elimination matrices are particularly easy to find (the inverse of a matrix that subtracts 2 of row 2 is one that adds 2 of row 2), and the product E2,1 1 E 1 3,1 E 1 3,2 is just the coefficients from elimination If A = (a i,j ), then the (i, j)th entry of A T, the transpose of A, is (a j,i ) The matrix that permutes the rows of another matrix can be found by performing the permutation on the identity matrix There are n! n n permutation matrices, the inverse of a permutation matrix is its transpose (ie P P T = P P 1 = I), and the product (or transpose) of a permutation matrix is another permutation matrix A vector space is a collection V of objects (which are called vectors), which can be added or multiplied by a (real) number (and the result will still be in V ) A subspace of a vector space V is a subset of V which is still a vector space For example, the column space of an m n matrix is a subspace of R m Lecture 4 R n ; column space and null space; solving Ax = 0: pivot variables, special solutions Our prime example of a vector space is R n, that is, we take our vectors to be n-tuples of real numbers, and perform addition and scalar multiplication componentwise The column space of a matrix A is the set of all linear combinations of the columns of a matrix Equivalently, it is the set of all b so that Ax = b has a solution We denote the column space by C(A) The null space of a matrix A is the set of all x so that Ax = 0 We denote the null space by N(A) To find the null space of a matrix A: 2

3 i Use Gauss-Jordan elimination to convert A to reduced row echelon form, R You will have r pivot variables and n r free variables ii Set the first free variable equal to 1 and the rest equal to 0, then solve for the pivot variables This is the first special solution iii Repeat the previous step with each of the other free variables to find n r linearly independent special solutions iv These special solutions form a basis for N(A) (so any linear combination of the special solutions is in the null space) Lecture 5 Solving Ax = b: row reduced form R; independence, span, basis and dimension Algorithm for complete solution to Ax = b: i Use row operations to change A to R ii Set free variables to zero, solve for pivot variables to find x particular iii Find the nullspace of A, N(A) iv Complete solution to Ax = b is x = x p + x n, where x n is any vector in N(A) So a complete solution to such a problem would consist of finding some x p (remember there are infinitely many), and N(A) (plus writing x = x p + x n ) Given an m n matrix A with rank r, there are three special cases: i r = n < m: Then N(A) = {0}, and Ax = b has either 0 or 1 solution ii r = m < n: Then dim(c(a)) = r = m, so C(A) = R m, and so Ax = b always has a solution Also, dim(n(a)) = n r > 0, so there are in fact many solutions iii r = m = n: Then N(A) = {0} and Ax = b always has a solution Thus there is always a unique solution to Ax = b A set of vectors v 1, v 2,, v n are linearly independent if a 1 v 1 + a 2 v a n v n = 0 means that a 1 = a 2 = = a n = 0 The algorithm for checking whether vectors are independent is to create a matrix A with the vectors as columns If N(A) = 0, then the vectors are independent Otherwise, they are dependent 3

4 The span of a set of vectors is all the linear combinations of those vectors We say that v 1, v 2,, v n span a vector space V if V = span{v 1, v 2,, v n } For example, the span of the columns of a matrix is the column space A set of vectors is a basis for a vector space V if i The vectors are linearly independent, and ii The vectors span V The dimension of a vector space V is the number of vectors in any basis for V (recall, we showed that any basis for V has the same number of vectors) We also showed that if dim(v ) = n, then any n linearly independent vectors in V will be a basis for V Lecture 6 The four fundamental subspaces; matrix spaces, polynomial spaces We took an m n matrix A, and looked at the column space(c(a)), the null space (N(A)), the row space (C(A T )), and the left null space (N(A T )) The natural questions to ask when looking at subspaces are: 1 What is a basis? 2 What is the dimension? We answer these questions here: Suppose we have a matrix A Then if we take the augmented matrix [ A I ] and use row operations to reduce A to reduced row echelon form, R, then we call the matrix on the right E, for elimination matrix (note the matrix E s relationship to the elimination matrices from chapter 1) That is, we use row reduction to go from: [ A I ] [ R E ] Now we can easily read off the rank, r of the matrix A, by counting the pivot variables in R, as well as calculate: Dimension of C(A): Is just the rank, r Basis for C(A): Is the r columns of A that correspond to the pivot columns of R Dimension of C(A T ): Is also the rank, r Basis for C(A T ): These are the first r ows of R (since row operations do not change the row space, and the first r rows are the pivot rows) Dimension of N(A): This is the number of free variables, which is n r 4

5 Basis for N(A): We find n r solutions to the system of equations Rx = 0, by setting one free variable equal to 1 at a time, while leaving the rest equal to zero, and solving Dimension of N(A T ): Since this is just the null space of A T, which has r pivots and m r free variables, this must have dimension m r Basis for N(A T ): Take the bottom m r rows of E The space M m n of m n matrices can also be considered a vector space, even though matrices are not traditionally thought of as vectors We also inspected the subspaces of upper triangular, symmetric and diagonal matrices Be able to find bases for these spaces As an example, a basis for M 3 3 is , ,, The space P n of polynomials of degree n is also a vector space of dimension n + 1, with basis {1, x, x 2,, x n } Lecture 7 Graphs, networks, incidence matrices A graph consists of nodes and edges If these were more serious notes, there d be an example drawn An incidence matrix for an oriented graph with m edges and n nodes will be an m n matrix, with the entries 1 if edge i leaves node j a i,j = 1 if edge i enters node j 0 otherwise Each of the four fundamental subspaces has a physical interpretation, starting with interpreting the vector x as the potential at each node: The column space The vector e = Ax represents the possible potential differences The null space This is the stationary solution- when there is no potential difference The left null space The set of y so that Ay = 0 are those currents which satisfy Kirchoff s circuit law, which says that the net flow of current at any node must be 0 The row space The corresponding pivot rows will create a maximum tree in the graph (ie a subgraph that has no loops, but contains every node) 5

6 An incidence matrix has another interesting interpretation: the dimension of N(A T ) is the number of loops in the graph, while the rank is the number of nodes, minus 1 Hence, dim(n(a T )) = m r # loops = # edges # of nodes 1 This is Euler s formula Lecture 8 Orthogonal vectors and subspaces; projections onto subspaces Two vectors x and y are orthogonal if x T y = 0 Two subspaces S and T are orthogonal if s T t = 0 for every vector s S and t T Two subspaces S and T of R n are orthogonal complements if i S and T are orthogonal ii dims + dimt = n The row space and null space of an m n matrix are orthogonal complements in R n The column space and the left null space of an m n matrix are orthogonal complements in R m To project a vector b onto the subspace generated by a, we use the projection matrix P, given by P = aat a T a Then the projection of b is just P b We define that a projection matrix is any matrix so that i P T = P, and ii P 2 = P Lecture 9 Projection matrices and least squares; orthogonal matrices and Gram-Schmidt We use projections to solve least squares problems That is, in the event that there is no x so that Ax = b, we find an ˆx so that Aˆx = b and x ˆx 2 is as small as possible We solve least squares using the projection matrix in the sense that P b = Aˆx P = A(A T A) 1 A T, 6

7 In practice, to solve the least squares problem, you solve the equation A T Aˆx = A T b This will have a solution if and only if A T A is invertible, which is true whenever A has independent columns A set of vectors q 1, q 2,, q n is orthonormal if { q T 1 if i = j i q j = 0 if i j Any matrix (rectangular or square) Q with orthonormal columns has the property Q T Q = I If Q is also square, then Q T = Q 1 If we have a least squares problem with an orthogonal matrix (ie one with orthonormal columns), then the projection equation simplifies to ˆx = Q T b, so in particular, the i th coordinate of ˆx is ˆx i = q T i b The Gram-Schmidt process takes a set of vectors a, b, c, z (ok, I don t mean precisely 26 vectors, but I don t want to involve subscripts either so bear with me), and converts them into orthogonal vectors A, B, C,, Z, and then into orthonormal vectors q 1, q 2, q 3,, q n, so that all of the different sets of vectors have the same span Here is the algorithm: 1 We define the orthogonal vectors recursively: A = a, B = b AT b A T A, C = c AT c A T A BT c B T B, Z = z AT z A T A BT z B T B YT z Y T Y 7

8 2 We normalize the vectors: q 1 = q 2 = q 3 = q n = A A B B C C Z Z (1) Lecture 10 Properties of determinants; determinant formulas and cofactors We deduced that three properties of the determinant completely determine the determinant We used these three properties to prove that seven more properties hold, and then used this to deduce formulas for the determinant The determinant is a function that eats square (real valued) matrices and gives a (real) number The three defining properties of the determinant are: 1 deti = 1 2 Transposing two rows of a matrix changes the sign of the determinant 3 The determinant is linear in each row This means: a Multiplying a row by a number multiplies the determinant by the same number For example, if A = r 1 r 2 r i r n, and A = r 1 r 2 tr i r n, then deta = t deta b Adding a vector to a row of a matrix is additive (this doesn t seem like the right word to use, but I don t think a correct and simple word exists) For 8

9 example, if r 1 r 2 A r = r i, A s = r n and A = then deta = deta r + deta s r 1 r 2 r i + s i r n r 1 r 2 s i r n,, We then used the previous three properties to deduce seven more properties that the determinant must satisfy: 4 If two rows of A are equal, then deta = 0 5 Subtracting k (row i) from row j does not change the determinant 6 If A has a row of zeros, then deta = 0 7 The determinant of an upper triangular matrix is the product of the pivots: d 1 u 1,2 u 1,n 0 d 2 u 2,n detu = = d 1 d 2 d n 0 0 d n 8 deta = 0 if and only if A is singular 9 det(ab) = (deta)(detb) 10 deta = deta T We then used the above 10 properties to determine 3 formulas for the determinant of a matrix: Long formula with n! terms: By expanding a matrix using property 3b, and eliminating those with rows of 0 using property 6, we got deta = ±a 1,α a 2,β a 3,γ a n,ω, n! permutations of 1,n 9

10 where {α, β, γ,, ω} is some permutation of {1, 2, 3,, n}, and the sign is determined by whether this is an odd or even permutation Cofactor expansion: The cofactor of a i,j, denoted c i,j, is c i,j = ( 1) i+j det Then we concluded that [ (n 1) (n 1) matrix with row i, col j removed deta = a 1,1 c 1,1 + a a,2 c 1,2 + + a 1,n c 1,n, and referred to this as the cofactor expansion along row 1 A similar formula holds expanding along any row or column Row reduction: Using properties 5 and 7, we concluded that row reducing a matrix to A = LU, then deta = detu = the product of the pivots This is the most computationally efficient method in general, though cofactors are also very useful in computing by hand Lecture 11 Applications of the determinant: Cramer s rule, inverse matrices, and volume; eigenvalues and eigenvectors We defined C to be the cofactor matrix of A: that is, c i,j is the cofactor associated with a i,j This allowed us to write A 1 = 1 deta C (note this equation only holds if A is invertible) Cramer s Rule gives us an explicit way of solving for each coordinate of Ax = b In particular, x 1 = detb 1 deta x 2 = detb 2 deta x n = detb n deta, where B i is the matrix A with the i th column replaced by the vector b We also saw that the volume of an n-dimensional parallelepiped with edges a 1, a 2,, a n is the absolute value of the determinant of the matrix A with columns a 1, a 2,, a n ] 10

11 An eigenvalue of a matrix A is a number λ so that there exists a vector x (called the eigenvector) with Ax = λx To find the eigenvalues of A, we solve the characteristic equation det[a λi] = 0, which will be an n th degree polynomial (and so will have n not-necessarily-distinct, not-necessarily real roots) To find the eigenvectors, we take the eigenvalues λ 1,, λ n, and let x i be a vector in the nullspace of A λ i I (this is a little imprecise, since if two eigenvalues are the same, the nullspace of A λi may contain more than one linearly independent vector) If we have n independent eigenvectors, and put them as columns of the matrix S, then S 1 AS = Λ and A = SΛS 1, where Λ is a diagonal matrix with the eigenvalues along the diagonal: λ λ 2 0 Λ = 0 0 λ n You can remember this equation since Ax i = x i λ i corresponds to multiplying A on the right by the column x i, giving AS = SΛ Note that if a matrix can be diagonalized, then A k = SΛ k S 1, where Λ k is easily computed as λ k Λ k 0 λ k 2 0 = 0 0 λ k n A matrix is diagonalizable if and only if it has n independent eigenvectors If each of the eigenvalues are different, then the matrix is sure to be diagonalizable However, if a matrix has repeated eigenvalues, then it may or may not be diagonalizable 11

12 Solved the equation: u k+1 = Au k, given the initial vector u 0, by noting that u k = A k u 0 To actually compute u k : i Find eigenvalues λ 1,, λ n and eigenvectors x 1,, x n of A, ii Write u 0 = c 1 x 1 + c 2 x c n x n = Sc, where S is the eigenvector matrix, and c = [c 1,, c n ] T is the solution vector to Sc = u 0 iii Then u k = Λ k Sc Lecture 12 Diagonalization and powers of A; differential equations and e At Solved linear equations of the form du 1 dt du 2 dt du n dt = a 1,1 u 1 + a 1,2 u a 1,n u n = a 2,1 u 1 + a 2,2 u a 2,n u n = a n,1 u 1 + a n,2 u a n,n u n, which we wrote in the decidedly more compact form du dt = Au We typically are also given an initial condition u(0) To solve: i Find eigenvalues λ 1,, λ n and eigenvectors x 1,, x n of A, ii Solution is u(t) = c 1 e λ1t x 1 + c 2 e λ2t x c n e λnt x n, where c = [c 1,, c n ] T is found by noting that u(0) = Sc This can also be written u(t) = Se Λt S 1 u(0) We noted that the exponential of a matrix is defined by: e At = I + At + (At)2 2! + (At)3 3! + = Se Λt S 1, (2) with the second equality holding only if A is diagonalizable 12

13 We also saw that e Λt = e λ1t e λ2t e λnt You can change a single 2 nd order equation into a system of 1 st order equations by rewriting y + by + ky = 0 as ( y u = y ) (, so u y = y ) = ( b k 1 0 ) ( y y ) This can also be used to reduce n th order differential equations to a system of n first order equations Lecture 13 Markov matrices, Fourier series A Markov matrix is one where i All entries 0 ii The entries in each column add to 1 If A is a Markov matrix, then λ = 1 is an eigenvalue, and λ i 1 for all other eigenvalues Hence the steady state will be some multiple of the eigenvector x 1 corresponding to λ 1 = 1 Given an orthonormal basis q 1, q 2,, q n, we can write any v as v = x 1 q 1 + x 2 q c n q n Since the q i s are orthonormal, multiplying the equation on the left by q T i leaves us with q T i v = x i The Fourier series for a function f(x) is the expansion f(x) = a 0 + a 1 cos x + b 1 sin x + a 2 cos 2x + b 2 sin 2x + We define the inner product for these functions as f T g = 2π 0 f(x)g(x)dx, 13

14 Lecture 14 Symmetric Matrices and observe that 1, cos x, sin x, cos 2x, sin 2x, is an orthogonal basis (though each one has norm π, so it is easy to make it orthonormal) Hence to find b 2 (for example), we use the above and observe b 2 = 1 π 2π 0 f(x) sin 2x dx If you have a symmetric matrix (that is, A = A T, or when a complex matrix, A = ĀT ), then 1 The eigenvalues of A are real, and 2 The eigenvectors of A can be chosen to be orthogonal Then a symmetric matrix A can be factored as A = QΛQ T (compare to the usual case A = SΛS 1 ) This is called the spectral theorem Multiplying out the factorization above, we get where each A = λ 1 q 1 q T 1 + λ 2 q 2 q T λ n q n q T n, q i q T i = q iq T i q T i q i is an orthogonal projection matrix So every symmetric matrix is a linear combination of orthogonal projection matrices For a symmetric matrix, the number of positive pivots is the same as the number of positive eigenvalues A positive definite matrix is a symmetric matrix where all eigenvalues are positive (which is the same as all the pivots being positive) Lecture 15 Complex matrices and the Fast Fourier Transform A complex number z can be written in three ways: i z = x + iy It can be viewed on the complex plane as the point (x, y), making the obvious identification with R 2 ii z = r(cos θ + i sin θ) In this case, r is called the modulus (fancy word for length ) of z, and θ is called the argument It can be viewed on the complex plane as the endpoint of the vector leaving from the origin with lengh r and angle θ 14

15 iii z = re iθ See above for the terminology This form has the same geometric interpretation as ii, but is more widely used For example, 2i = 2e πi 2 The complex conjugate of a complex number z is found by switching the sign on the imaginary part of z, or graphically by reflecting z over the real axis, and is denoted by z: If z = x + iy = re iθ, then z = x iy = re iθ A number z is real if and only if z = z The length of a complex number z is (z z) 1 2 = (x + iy)(x iy) = x2 + y 2 = r Given a complex vector z C n, z 1 z 2 z =, z n we noticed that the length was given by z T z, and so defined the Hermitian as the transpose of the conjugate: For vectors, z H := z T For complex matrices, A H := ĀT We use the Hermitian to translate words we used for real matrices and vectors into words for complex matrices and vectors: Def for R-valued Def for C-valued Length of x x T x x H x Inner product x T y x H y A symmetric A = A T { A = A H { q 1,, q n q T i q j = 0 if i j q H i 1 otherwise 0 if i j j = 1 otherwise orthonormal Notice that the only difference is that the transpose is always exchanged for a Hermitian, and that when dealing with real vectors/matrices, each definition is the same The n th Fourier matrix, F n is defined as ω ω 2 ω n 1 F n = 1 ω 2 ω 4 ω 2(n 1), 1 ω n 1 ω 2(n 1) ω (n 1)(n 1) 15

16 where ω is the n th root of unity, that is, ω is a solution of x n 1 = 0 More specifically, ω = e 2πi n The columns of F n are orthogonal (so F H n F n = I), and can be multiplied very quickly Lecture 16 Positive definite matrices and minima; Similar matrices and Jordan form We looked at four equivalent definitions for an n n matrix A being positive definite: i λ 1 > 0, λ 2 > 0,, λ n > 0 ii Each of the n leading subdeterminants are strictly positive The m th leading subdeterminant is the determinant of the m m matrix in the top left corner of A iii Each of the pivots of A are strictly positive (Careful: this does not mean that the diagonal of A is positive It means that if A = LU, then the elements on the diagonal of U are positive!) iv x T Ax > 0 for all x We define positive semidefinite by replacing all the incidences of the words strictly positive above by positive or zero The terms negative definite and negative semidefinite are defined the same, just replacing positive by negative in the definition The function produced by x T Ax is called a quadratic form When A is 2 2, this corresponds to a conic section If A is positive definite, then x T Ax is a paraboloid More generally, let f : R n R (think: f(x 1, x 2,, x n ) = y) then if f(a) = 0 (where f = ( f x 1, f x 2,, f x n )) we have: 2 f 2 f x 2 1 x 1 x 2 2 f 2 f 2 f f(a) is a minimum if f x (a) = 2 x 1 x f x n x 2 2 f x n x 1 x 1 x n 2 f x 2 x n 2 f x 2 n is positive definite (where each second derivative is evaluated at a) Compare this to calculus where a is a minimum if f (a) = 0 and f (a) > 0 Positive definite matrices act like positive numbers: If A, B are positive definite matrices, then so are A 1 and A + B Also, A T A is positive definite for any m n matrix A with rank n (since x T A T Ax = (Ax) T (Ax) = Ax 2 > 0) 16

17 Two n n matrices are similar if there is an invertible matrix M so that B = M 1 AM An example to remember is that every diagonalizable matrix is similar to a diagonal matrix (A = SΛS 1 ) Similar matrices have the same eigenvalues, and represent the same linear transformations with different coordinates We found a good representative for each family of matrices (that is to say, a family of matrices is the set of all matrices you can get by conjugating the matrix by an invertible matrix ie the set of matrices similar to eachother), which we called the Jordan canonical form A Jordan block is the matrix λ λ 1 0 J λ = 0 0 λ λ Every matrix A is similar to a Jordan canonical matrix, which looks like J λ J λ2 0 J = 0 0 J λn Lecture 17 Singular Value Decomposition; Linear transformations and their matrices, coordinates The singular value decomposition works for all matrices, and decomposes the m n matrix A into A = UΣV T, where U is an m m orthogonal matrix, Σ is an m n diagonal matrix with all entries 0, and V is an n n orthogonal matrix The columns of U are the eigenvectors of A T A (which, recall, is positive indefinite), and the diagonal entries of Σ are the square roots of the associated eigenvalues The columns of V are the eigenvectors of AA T, and again the diagonal entries of Σ are square roots of the eigenvalues We can also look at the columns v 1,, v n of V and the columns u 1,, u m of U in the following way: Let r be the rank of A 17

18 Lecture 18 Change of basis v 1,, v r are an orthonormal basis for C(A), u 1,, u r are an orthonormal basis for C(A T ), v r+1,, v n are an orthonormal basis for N(A T ), u r+1,, u m are an orthonormal basis for N(A) A linear transformation is a function T : R n R m so that i T (u + v) = T (u) + T (v), and ii T (cv) = ct (v) Given coordinates, that is, a basis for R n and R m, every linear transformation T is uniquely associated with a matrix A Translating a linear transformation into a matrix: 1 You will be given a linear transformation T : R n R m, as well as a basis v 1,, v n of R n and a basis u 1,, u m of R m (in practice, you may decide the bases) 2 Evaluate the basis elements of R n, and write them in the coordinates of R m : T (v 1 ) = a 1,1 w 1 + a 2,1 w a m,1 w m T (v 2 ) = a 1,2 w 1 + a 2,2 w a m,2 w m T (v n ) = a 1,n w 1 + a 2,n w a m,n w m 3 Now A = (a i,j ) will be the matrix representation of the linear transformation in the given basis Two matrices represent the same linear transformation in different coordinates precisely when they are similar, which is one good reason to use eigenvectors as coordinates (so that the linear transformation matrix is diagonal) A natural question is : if a linear transformation T : R n R m has matrix A with respect to the basis v 1,, v n, and matrix B with respect to the basis u 1,, u m then what is the relationship between A and B? Denote R n by V or U, depending on the basis used Then we want to find a matrix M so that A V V M M B U U commutes That is to say, B = M 1 AM But M may be interpreted as a linear transformation, with Mv i = v i = a 1,i u 1 + a 2,i u a n,i u n, and we use the above to find the matrix M 18