Introduction to Linear Algebra, Second Edition, Serge Lange

Introduction to Linear Algebra, Second Edition, Serge Lange Chapter I: Vectors R n defined. Addition and scalar multiplication in R n. Two geometric interpretations for a vector: point and displacement. As a point: place a dot at the coordinates. As a displacement of a point: If the point is A and the displacement is B, the displaced point is A + B. Addition of displacements and scalar multiplication of displacements: algebraic definition. Addition of displacements and scalar multiplication of displacements: geometric definition. Sum of displacements: interpret as first displacement followed by second displacement. To form A B, draw an arrow from endpoint of B to endpoint of A. Every point can be thought of as a displacement from the origin. Every pair of points A, B gives rise to a displacement from A to B: Coordinates of displacement: B A. AB. Two displacements A and B are parallel if A = cb for some c 0. Reason: same slope. Same direction if c > 0, opposite directions if c < 0. The quadrilateral produced by two displacements is a parallelogram. We will refer objects with coordinates as vectors. Norm of a vector: square root of sum of squares of coordinates. Produces length of vector in R 2 and R 3 by Pythagorean Theorem. When are two displacements perpendicular in R 3? Pythagorean Theorem implies a b + a 2 b 2 + a 3 b 3 = 0. Law of Cosines yields a b + a 2 b 2 + a 3 b 3 = A B cos θ. Scalar product of vectors: A B = a b + + a n b n. Properties: page 3.

Two vectors defined to be orthogonal if A B = 0. Agrees with perpendicular in low dimensions. Law of Cosines yields A B = A B cos θ. Distance between two points: norm of the displacement between them. Circles, spheres, open and closed discs, open and closed balls. General Pythagorean Theorem: When two vectors are orthogonal, A+ B 2 = A 2 + B 2. Proof: You can either use coordinates or properties of the dot product. Orthogonal projection of A onto B, producing P : P = cb for some c. We require A cb B, hence (A cb) B = 0. This yields c = A B B B. c is called the component of A along B and is a number. Unit vectors: E i. The component of A along E i is a i. Schwarz Inequality: In R n, A B A B. Proof: Apply Pythagorean Theorem to A = A cb + cb to derive A 2 c 2 B 2. Multiply through by B 2 and simplify. Note: We knew this already in low dimensions using A B = A B cos θ, but θ not defined in high dimension. But Schwarz implies that A B A B, so this number is equal to cos θ for a unique θ [0, π], so we define θ in high dimensions using the formula θ = cos A B A B. Triangle Inequality: A + B A + B. Proof: Compute square norm of A + B using dot product, then apply Schwarz Inequality. 2

Lines: The equation of a line is y = mx + b. y-intercept is (0, b) and slope is m, so every time you run you rise m. Every time you run t you rise mt. This brings you to (t, mt + b). Point corresponding to t is P (t) = (t, mt + b). Geometric interpretation: Initial point is A = (0, b) and displacement is B = (, m). So P (t) = A + tb. Parametric equation of line through A with displacement B: P (t) = A + tb. This yields equations for x(t) and y(t). Recovering the equation satisfied by the coordinates on a parametric line: (x, y) = (a, a 2 ) + t(b, b 2 ). Now solve for y in terms of x. Slope of P (t) = A + tb: ratio of coordinates in B. Equation of line starting at A when t = 0 and ending at B when t = : P (t) = A + t(b A). Also written P (t) = ta + ( t)b. Note: when the equation is written this way, the distance from A to P (t) is t B A. Since the distance between A and B is B A, t measures what fraction of the way you have traveled. Midpoint: Use t =. One-third of 2 the way there: use t =. 3 Given x(t) and y(t), one can either write y in terms of x or write (x(t), y(t)) = A+tB and figure out the slope. Since the line passes through A, the equation is y a 2 = m(x a ). Planes: A plane in R 3 is determined by 3 non-collinear points. Typical point in the plane through by A, B, C: Starting at A, one can more in the direction from A to B, in the direction from A to C, and in any combination of these. So the typical point is P (s, t) = A + s(b A) + t(c A). Example: If A = (,, ), B = (2, 3, 3), C = (5, 4, 7) then P (s, t) = (,, )+s(, 2, 2)+t(4, 3, 6) = (+s+4t, +2s+3t, +2s+6t). We obtain parametric equations x(s, t) = + s + 4t, y(s, t) = + 2s + 3t, z(s, t) = + 2s + 6t. Getting an equation out of this: Solve for x and y in terms of s and t, then express z in terms of x and y. This yields s = /5( 3x + 4y), t = (/5) + (2x)/5 y/5, z = /5( 3 + 6x + 2y). Normalized, the equation of the plane is 6x + 2y 5z = 3. 3

Using (,, ) as a solution we also get the equation 6() + 2() 5() = 3. Subtracting, we obtain 6(x ) + 2(y ) 5(z ) = 0. Generalizing this: the general equation of a plane is ax + by + cz = d. Assuming that it passes through the point (x 0, y 0, z 0 ), another equation is a(x x 0 ) + b(y y 0 ) + c(z z 0 ) = 0. We call this the standard equation of the plane. Geometrically: Let N = (a, b, c) and let Q = (x x 0, y y 0, z z 0 ). Then N Q = 0, so N and Q are perpendicular. The plane can be described as all points (x, y, z) such that (x x 0, y y 0, z z 0 ) is perpendicular to (a, b, c). N is called the normal vector. Example: find the equation through (, 2, 3) and perpendicular to (4, 5, 6). Solution: (4, 5, 6) (x, y 2, z 3) = 0. Finding a, b, c: consider the example A = (,, ), B = (2, 3, 3), C = (5, 4, 7) again. Two displacements in the plane are B A = (, 2, 2) and C A = (4, 3, 6). So we want (a, b, c) (, 2, 2) = 0 and (a, b, c) (4, 3, 6) = 0. Solving for a and b in terms of c we obtain a = ((6c)/5) and b = ((2c)/5). There are infinitely many choices for (a, b, c). Choosing the one where c = 5 we obtain (a, b, c) = ( 6, 2, 5). This is consistent with the previous method of solution, but faster. Angle between planes: defined to be the angle between the normal vectors. Planes are parallel when their normal vectors are parallel. Projection of the point Y onto the plane N (X X 0 ) = 0: We seek the vector X such that X is in the plane and (Y X) N. This yields the equations N (X X 0 ) = 0, Y X = αn. Substituting X = Y αn in the first equation yields N (Y αn X 0 ) = 0. 4

Solving for α yields Therefore α = N (Y X 0). N N X = Y N (Y X 0) N. N N Distance from the point Y to the plane N (X X 0 ) = 0: This is defined to be the distance between Y and the projection of Y onto the plane. Since Y X = αn and α = N (Y X 0), the distance is N N α N = N (Y X 0) N where θ is the angle between Y X 0 and N. = Y X 0 cos θ Remark: Let s call the projection of Y onto the plane the point P. We claim that P is the point in the plane closest to Y. Reason: Let X be any point in the plane. Then Y X = Y P + P X. By construction, Y P is parallel to N. Since N is orthogonal to any displacement in the plane and P X is a displacement in the plane, Y P is orthogonal to P X. By the Pythagorean Theorem, this implies Hence (Y P ) + (P X) 2 = Y P 2 + P X 2. Y X 2 Y P 2. In other words, the distance from Y to any arbitrary X in the plane is the distance from Y to P. Chapter 2: Matrices and Linear Equations Matrix: Another kind of vector, since it has coordinates. So addition and scalar multiplication are defined. Matrix multiplication: the ij entry of AB is R i C j where the row-decomposition of A is (A,... A m ) and the column-decomposition of B is (B,..., C B ). The number of coordinates in the rows of A must match the number of coordinates in the columns of B. Formula for c ij given AB = C. 5

Let A be a matrix and let X be a column vector. Then AX = x A + + x n A n. Transforming a system of equations into a matrix equation: see the example on page 49. Write as xa + ya 2 + za 3 = B and re-write as AX = B. Application: [ ] formula [ for ] rotation of a vector about the origin. Input vector x xθ, output vector. The relationship is y y θ [ xθ y θ ] [ ] x = R(θ). y More matrix algebra:. Let the columns of B be (B, B 2,..., B n ). Then AB = (AB, AB 2,..., AB n ). 2. Elementary column vector: E i. It satisfies AE i = A i where A i is column i of A. 3. Identity matrix: I = (E, E 2,..., E n ). It satisfies AI = A by #2. 4. Distributive and associative laws: see page 53. Distributive law follows from dot product properties. Associative law can be done using brute force. 5. A property not satisfied: commutativity. Just do an example. Invertible matrix: A is invertible if square and there exists a square matrix B such that AB = BA = I. Notation: A. Rotation matrices are invertible: First note that R(α)R(β)X = R(α + β)x for all column vectors X. In particular, for X = E i. So R(α)R(β) and R(α + β) have the same columns and are equal. This implies that R(α) has inverse R( α). Solving an equation of the form AX = B is easy if we know that A is invertible: X = A B. Not all square matrices are invertible: the zero matrix for example. Homogeneous system of equations: a matrix equation of the form AX = 0 where X is a column vector. 6

Theorem: When there are more variables than equations in a homogeneous system of equations, then there are an infinite number of solutions. Proof: By induction on the number of equations. equation: true. Now assume true for n homogeneous equations with more than n variables. Consider n + equations and more than n + variables. Take the first equation, express one of the variables in terms of the others, then substitute this into the remaining n equations. In the remaining n equations there are more than n variables, so they have an infinite number of solutions. Each is a solution to the first one. Corollary: More variables than equations in homogeneous system guarantees at least one non-trivial solution. Application to vectors: say that vectors A,..., A k are linearly dependent if there is a non-trivial solution to x A + +x k A k = 0. Then any n+ vectors in R n are linearly dependent. Reason: more variables than equations. Application to square matrices, treated as vectors: any n 2 + n n matrices are linearly dependent. Solving AX = B using Gauss Elimination: First, represented in augmented matrix form. Second use the following elementary transformations which don t change the solution set: swap equations, multiply equation by number, add two equations. Most importantly, adding a multiple of a given row to another. Leading term in row: first non-zero coefficient. Pivoting on a leading term: adding multiples of the row it is in to other rows to get zeros in the column it is in. Iterate this procedure from to bottom so that the surviving non-zero rows have different leading term columns (row echelon form). The variables fall into two categories: leading and slack. Slack variables can be assigned arbitrary values. Use back-substitution to get the leading variables expressed in terms of the slack variables. In a homogeneous system with more variables than equations, there will be at least one slack variable, so there will be an infinite number of solutions. Application to finding the inverse of a matrix: We wish to solve AB = I. In other words, (AB, AB 2,..., AB N ) = (E, E 2,..., E n ). Consider solving AB = E. Compare to solving AB 2 = E 2. Coefficient matrix is the same, all that changes is the augmented columns. Do these simultaneously. If there is an inverse, we should be able to continue until the coefficient matrix looks like I, in which case the augmented side can be read off as B. 7

Matrix units: E ab. Properties:. E ab E cd = 0 if b c and E ab E bc = E ac. 2. E pp A zeros out all rows except row p in A. 3. E pq A zeros out all rows except row q in A and moves it to row p. 4. A E pp A E qq A + E pq A + E qp A = (I E pp E qq + E pq + E qp )A swaps rows p and q in A. 5. A E pp A + xe pp A = (I E pp + xe pp )A multiplies row p of A by x. 6. A + xe pq A = (I + xe pq )A adds x copies of row q in A to row p of A. The matrices in 4, 5, 6 mimic elementary row operations. They are invertible, since element row operations can be undone. Theorem: If AB = I then BA = I. Proof: We have seen the procedure for finding B such that AB = I: perform elementary row operations on [A I] until it becomes [I B]. We can see that every operation applied to A is also applied to I. So if E n E n E A = I then E n E n E I = B. But this says BA = I. The last section can be read by students and skipped in lecture since we have covered the topics above. Chapter 3: Vector Spaces Vector Space: any set behaving like R n. The required properties are listed on page 89. Examples: matrices, polynomials. Subspace of a vector space: a subset of a vector space which is closed with respect to vector addition and scalar multiplication. Examples: solutions to a homogeneous system of equations, upper-triangular matrices, polynomials with a given root. More examples: line through origin, plane through origin. Intersection and sum of subspaces produces new subspace. Skip Section 3, Convex Sets. Linear combinations. 8

The set of linear combinations of v,..., v k produces a subspace W. (They span the subspace.) Linearly independent vectors: the opposite of linearly dependent vectors. Do examples of linearly independent vectors, including trig and exponential functions. Basis for a vector space: a set of linearly independent vectors that span the vector space. Basis for a subspace: same idea. Examples: basis for R n. Basis for a homogeneous system of equations. Basis for polynomials of degree 3 with as a root. Theorem 4.2, p. 07: Coefficients in basis linear combination are unique. Every spanning set yields a basis. Reason: If the spanning set is already linearly independent, you have a basis. But if there is a non-trivial linear combination of one of them that produces 0, you can discard one. Keep on going until what you have left is linearly independent. This is essentially Theorem 5.3. Definition: when a vector space has a basis of n vectors, we say that the dimension of the vector space is n. Problem with this definition: it seems to imply that every basis has the same number of vectors in it. Question: can a vector space have bases of different sizes? Answer: no. Proof: Suppose that there is a basis (u,..., u m ) and another basis (v,..., v n ) where n > m. Express each v i in terms of u,..., u m : v = a u + + a m u m v 2 = a 2 u + + a 2m u m v n = a n u + + a nm u m. Consider the coordinate vectors (a,..., a m ), (a 2,..., a 2m ),..., (a n,..., a nm ) in R m. There are n of them, so they must be linearly dependent with a nontrivial way to combine them into (0,..., 0) via (x,..., x n ). This implies x v + + x n v n = 0. 9

This cannot happen because the v i are linearly dependent. So you cannot have bases of different sizes. This is Theorem 5.2. Note: If you look at this proof carefully you see that it says that any n > m vectors in an m-dimensional space are linearly dependent. This is Theorem 5.. Every linearly independent set in a finite-dimensional basis can be expanded to a basis. Reason: If they already span the vector space, you have a basis. But if there is a vector outside the span, add it, and the larger set is still linearly independent. Keep on going. You must eventually arrive at a spanning set and basis, because above we showed that there is an upper limit to the number of linearly independent vectors you can produce. This is Theorem 5.7. If V is a vector space of dimension n and W is a subspace then W has dimension k n. Reason: Find any non-zero vector in W. As before, keep on growing the list of linearly independent vectors. You can t outrun the dimension of n, so the process has to stop. This is Theorem 5.8. If V has dimension n, any n linearly independent vectors in V form a basis. Reason: expand to a basis. This is Theorem 5.5. It also implies Theorem 5.6. Row rank of a matrix: the dimension of its row space. Column rank of a matrix: the dimension of its column space. How to compute these: Every elementary row operation yields a matrix with the same row space. Reduce matrix to reduced row echelon form and read off the dimension. Similarly, every elementary column operation yields a matrix with the same column space. Reduce matrix to reduced column echelon form and read of the dimension. Theorem: Let A be an m n matrix. Then dim RS(A) = dim CS(A). Proof: If we don t worry about the impact on the row space and the column space of A, we can always perform a series of elementary row operations followed by a series of elementary column operations so that the resulting matrix A has the very simple form depicted in Theorem 6.2 on page 8 of the textbook. One can see that both the row space and the column space of A have the same dimension r. All we need to do is to prove that the row 0

space dimension of A is the same as the row space dimension of A and that the column space dimension of A is the same as the column space dimension of A. In class I proved that if a subset of columns of A forms a basis for the column space of A, then the corresponding columns of EA form a basis for the column space of EA, where E is an m m elementary matrix representing elementary row operations on A. Therefore the column space dimension of EA is the same as the column space dimension of A. The argument was ( ) ( ) α i EA i = 0 = E α i A i = 0 = E E α i A i = 0 = i i α i A i = 0 = α = α 2 = = 0. i We also know that the row space dimension of EA is the same as the row space dimension of A because EA has the same row space as A. Summary: row operations on a matrix preserve both the row space dimension and the column space dimension. Similarly, since elementary column operations on A can be expressed as AF where F is an n n elementary matrix representing column operations on A, AF has the same row space dimension and same column space dimension as A. So if A A A 2 A is a sequence of elementary row operations followed by a sequence of elementary column operations, all the row space dimensions and the column space dimensions are unchanged from what they are in A. Since they are the same in A, they must be the same in A. Note: if row or column swaps are involved, we must change the meaning of corresponding rows and columns accordingly. Chapter 4: Linear Mappings Linear mapping: A function T : V W between two vector spaces that satisfies T (u + v) = T (u) + T (v) and T (cv) = ct (v). Terminology: domain, range, image, inverse image, kernel. A large source of examples: T v = Av. Includes rotation and reflection. Other examples: Among polynomials, multiplication by x and differentiation. i

Another example: reflection across the plane in R 3 through origin with normal vector (, 2, 3). Formula: given input v, T (v) = v (v (, 2, 3))(, 2, 3). 7 Verify directly that this is linear. Another example: projection on to the same plane. v v (v (, 2, 3))(, 2, 3). 4 A linear map T : V W is completely determined by where it sends a basis of V. The range of T is the span of T (v ), T (v 2 ),..., T (v k ) where v,..., v k is a basis of V. Therefore im(t ) is a subspace of W and dim(w ) dim(v ). The kernel of T is a subspace of the domain. Use of the kernel: classifying all solutions to T (v) = b. The solution set is {v 0 + k : k kernel}. Example: solutions to an inhomogeneous system of equations. Example: solutions to the differential equation y = cos x. (The vector space is the set of differentiable functions, the linear transformation is differentiation, b = cos x, the kernel consists of constant functions). See also Theorem 4.4, p. 48. (I have stated the more general result.) A map is injective if it satisfies T (v) = T (v ) = v = v. Reflection across a plane is one-to-one. Projection onto the plane is not one-to-one. Criterion for injective linear map: kernel is trivial. Theorem 3., p. 39: When T : V W is injective T sends linearly independent vectors to linearly independent vectors. Exact relationship between dim(v ) and dim(t (V )) in T : V W : dim(v ) = dim(kernel) + dim(image). (Theorem 3.2, p. 39) Example: projection onto plane. Proof: Find a basis for V of the right side. First, choose a basis for the image: w,..., w i. Second, find v,..., v i with T (v i ) = w i for each i. They must be linearly independent. Let v, v 2,..., v k be a basis for the kernel. If we can just show that v,..., v i, v,..., v k is a basis for V, we are done. Linearly independent: If a linear combination of them is zero, then a linear combination of their images is 0. So the coefficients of the images of v,..., v i are 0. That just leaves a linear combination of the kernel basis equal to zero, so all coefficients are 0. Span: Choose any v V. Then T (v) span(w,..., w i ), therefore T (v) = T ( c i v i ), therefore v c i v i is in the kernel, so v is a span of the v j and the v j vectors. 2

Example: projection. Relation of the Rank-Nullity theorem to matrices: Let A be an m n matrix. It gives rise to a linear mapping T : R n R n via T (v) = Av. The image of T is the column space of A. Therefore dim(image) = r where r is the rank of A. The kernel of T is the solution set to Av = 0, and we know that row operations on A produce r linearly-independent rows and that there are n r slack variables. This implies dim(kernel) = n r. So we can see that dim(image) + dim(kernel) = n = dim(v ). Geometric interpretation of the kernel of T when T (v) = Av: the set of vectors perpendicular to every vector in the column space of A. So if A has m rows and n columns, CS(A) is a subspace of R n and ker(t ) considers of the orthogonal complement. It has dimension n r. For example, the equation of a plane through the origin is ax + by + cz = 0, so the vectors in the plane belong to the kernel of T defined by the matrix [ a b c ]. The rank is and the nullity is 3 = 2. More generally, a hyperplane in R n is the solution set to a x + + a n x n = 0. This corresponds to a n matrix, so the rank is and the nullity is n. In other words, a hyperplane has dimension n. One can also try to compute the dimension of the intersection of m hyperplanes. This corresponds to the the kernel corresponding to an m n matrix. The dimension of the intersection has to be n r. The matrix associated with a linear map: If T : R n R m is defined by T (v) = Av, then the matrix is A. If the matrix is not given, we can find it as follows: Suppose that T (E ) = v, T (E 2 ) = v 2,..., T (E n ) = v n, vectors in R m. Then by linearity x T ( x 2 ) = T (x E + + x n E n ) = x v + + x n v n. x n But this is exactly Av where the columns of A are v,..., v n. Hence T (v) = Av and the matrix is A. Example: projection onto a plane, reflection across a plane. When T : V W is a linear map but neither V nor W is in the form R k, we can still find a matrix representation for T : Choose a basis {v,..., v n } for 3

V, choose a basis {w,..., w m } for W, and identify each v V with a vector in R n whose entries come from the unique way the basis produces v. Do the same for vectors in W. You can now identify T with a map S : R n R m and it has a matrix representation A. This represents T also, but it is only valid for the particular choice of bases we choose. Example: Let V = P 3 and let W = P 2 (polynomial vector spaces). Let T : P 3 P 2 be given by T (p(x)) = p (x). A basis for P 3 is {, x, x 2, x 3 }. A basis for P 2 is {, x, x 2 }. Since T sends the polynomial a 0 + a x + a 2 x 2 + a 3 x 3 to the polynomial a + 2a 2 x + 3a 3 x 2, S sends the vector a a 2 to the vector a 3 a 0 0 0 2a 2. The matrix representation is A = 0 0 2 0. 3a 3 0 0 0 3 Example: Rotation through θ about a directed line through the origin in R 3 : If the line has has direction vector (0, 0, ) then we rotate (x, y) through θ and send z to z. This is represented by cos θ sin θ 0 A = sin θ cos θ 0. 0 0 x x cos θ y sin θ The vector y is sent to the vector x sin θ + y cos θ. z z But suppose instead we want to rotate about the line in the direction (,,). We will find a new coordinate system in which (,, ) is acting like the z axis. The plane perpendicular to (,, ) through the origin is given by x+y+z = 0. The typical vector in the plane is ( a b, a, b). We will find two perpendicular vectors in this plane. For the first one we choose (, 0, ). For the second one we choose a and b so that (, 0, ) ( a b, a, b) = 0. One choice is a = 2, b =. This yields (, 2, ). We scale these three down to length dividing by 2, 6, and 3 respectively to obtain v = (/ 6, 2/ 6, / 6), v 2 = (/ 2, 0, / 2), and v 3 = (/ 3, / 3, / 3). (We want to rotate counterclockwise from v to v 2.) The three vectors v, v 2, v 3 form an alternative coordinate system (basis) for R 3. Identifying the vector xv + yv 2 + zv 3 4 a 0

x with the vector y, the matrix representing rotation about the line through z (,, ) is cos θ sin θ 0 A = sin θ cos θ 0. 0 0 So the vector xv + yv 2 + zv 3 is sent to the vector (x cos θ y sin θ)v + (x sin θ + y cos θ)v 2 + zv 3. We can express this map in matrix form using matrix algebra. Setting V = [ v v 2 v 3 ], this map can be described as Setting we have T (V x x y ) = V A y. z z X x Y = V y, Z z X X T ( Y ) = V AV Y. Z Z Note that it is very easy to compute the inverse of this V because the columns have dot products which are all equal to 0 or. When θ = π we obtain 2 V AV = 3 + 3 3 + 3 3 3 3 3 3 3 3 + 3 3 3 3 One can check that this does send v to v 2 and v 2 to v and v 3 to v 3. For a general angle θ, we have V AV = 3 3 ( + 2 cos θ) ( ) ( ) 3 3 cos θ 3 sin θ ( ) 3 cos θ + 3 sin θ cos θ + 3 sin θ ( + 2 cos θ) ( ) ( ) ( 3 ) 3 cos θ 3 sin θ cos θ 3 sin θ 3 cos θ + 3 sin θ ( + 2 cos θ) 3 5..

Chapter 5: Composition of Linear Maps Define composition. The composition is linear and associative. The matrix of a composition is the product of the matrices. Associativity of composition implies associativity of matrix multiplication. Look at Section exercises. A linear map has an inverse if unique inputs produce unique outputs and every vector in the codomain is the image of a vector in the domain. Injective can be detected using the kernel. Surjective can be determined using a dimension argument. When the linear map is given by the matrix, the dimension of the kernel is the number of slack variables and the dimension of the image is the dimension of the column space, which by rank-nullity theorem is number of columns minus number of slack variables, i.e. number of leading variables. See also Theorems 2.4 and 2.5. The inverse of a bijective linear map is linear and its matrix representation is the inverse matrix, assuming domain and codomain are Euclidean space of the same dimension. Look at Section 2 exercises. Chapter 6: Scalar Products and Orthogonality Let V be a vector space over F = R or F = C, finite or infinite-dimensional. An inner product on V is a function, : V V F which satisfies the following axioms:. Positive-Definiteness: v, v 0 for all v V, and v, v = 0 if and only if v = 0 V. 2. Multilinearity: v + v, w = v, w + v, w and av, w = a v, w for all v, v, w V and a W. 3. Conjugate Symmetry: w, v = v, w for all v, w V. Inner-Product Space: A real or complex vector space V equipped with an inner-product. Note that axioms 2 and 3 imply v, aw = a v, w and v, w + w = v, w + v, w for all v, w V and a F. Examples: The usual dot product on R n, the generalized dot product on C n, the inner-product on P ([a, b]) defined by f, g = b f(x)g(x) dx. a 6

Norm: v = v, v. This satisfies av = a v where a = aa is absolute value (if real) or length (if complex). Orthogonal vectors: u,..., u n are mutually orthogonal iff u i, u j = 0 for all i j. Orthonormal vectors: u,..., u n are mutually orthonormal iff u i, u j = δ i,j for all i, j. In other words, they are mutually orthogonal and have length. Orthonormal projection: Let u,..., u n be mutually orthonormal. Let U = span(u,..., u n ). The linear operator P : V U defined by P v = v, ui u i is called orthonormal projection onto U. Properties of orthogonal and orthonormal vectors:. Mutually orthogonal vectors u,..., u n are linearly independent. Proof: Suppose a i u i = 0 V. Taking the inner product with u j we obtain 0 = 0 V, u j = a i u i, u j = a i u i, u j = a j u j 2 = a j. 2. Let u,..., u n be mutually orthogonal. Then u i 2 = u i 2. This is called the Pythagorean Theorem. Proof: u i, u i = u i, u j = u i, u i. 3. Let u,..., u n be mutually orthonormal. Then a i u i = ai 2. Proof: a i u i, a i u i = a i a j u i, u j = a i a i. 4. Let u,..., u n be mutually orthonormal. Let U = span(u,..., u n ). Then for any u U, u = u, u i u i. In other words, u = P u where P is orthonormal projection onto U. This also implies P 2 = P. Proof: Write u = a i u i. Then u, u j = a i u i, u i = a i u i, u j = a j. Properties of orthonormal projection:. Let u,..., u n be mutually orthonormal. Let U = span(u,..., u n ). Then for any v V and for any u U, v P v and u are orthogonal to each other, where P is orthonormal projection onto U. Proof: For any j, P v, u j = v, u i u i, u j = v, u i u i, u j = v, u j. Subtracting, v P v, u j = 0. 2. Let u,..., u n be mutually orthonormal. Let U = span(u,..., u n ). Then for any v V, the unique vector u U that minimizes v u is P v. 7

Proof: Let u U be given. Then we know that v P v and P v u are orthogonal to each other. By the Pythagorean Theorem, v u 2 = v P v 2 + P v u 2 v P v 2, with equality iff P v u = 0 iff u = P v. Theorem: Every finite-dimensional subspace of an inner product space has an orthonormal basis. Proof: Let V be the inner product space. Let U be a subspace of dimension n. We prove that U has an orthonormal basis by induction on n. Base Case: n =. Let {u } be a basis for U. Then { u u } is an orthonormal basis for U. Induction Hypothesis: If U has dimension n then it has an orthonormal basis {u,..., u n }. Inductive Step: Let U be a subspace of dimension n+. Let {v,..., v n+ } be a basis for U. Write U n = span(v,..., v n ). By the induction hypothesis, U n has an orthonormal basis {u,..., u n }. Let P be orthonormal projection onto U n. Then the vectors u,..., u n, v n+ P v n+ are mutually orthogonal and form a basis for U. Setting u n+ = v n+ P v n+ v n+ P v n+, the vectors u,..., u n+ form an orthonormal basis for U. Remark: The proof of this last theorem provides an algorithm (Gram- Schmidt) for producing an orthonormal basis for a finite-dimensional subspace U: Start with any basis {v,..., v n }. Set u = v v. This is an orthonormal basis for span(v ). Having found an orthonormal basis {u,..., u k } for span(v,..., v k ), one can produce an orthonormal basis for span(v,..., v k+ ) by appending the vector u k+ = v k+ P v k+ v k+ P v k+, where P is orthonormal projection onto u,..., u k. A Minimization Problem: Consider the problem of finding the best polynomial approximation p(x) P 5 ([ π, π]) of sin x, where by best we mean that π π (sin x p(x)) 2 dx 8

is a small as possible. To place this in an inner-product setting, we consider P 5 ([ π, π]) to be a subspace of C([ π, π]), where the latter is the vector space of continuous functions from [ π, π] to R. Then C([ π, π]) has inner product defined by f, g = π f(x)g(x) dx. We are trying to minimize sin x p(x) 2. However, we know how to minimize sin x p(x) : π p(x) = P (sin x) where P is orthogonal projection onto the finite-dimensional subspace P 5 ([ π, π]). The latter has basis {, x, x 2, x 3, x 4, x 5 }, and Gram-Schmidt can be applied to produce an orthonormal basis {u 0 (x), u (x), u 2 (x), u 3 (x), u 4 (x), u 5 (x)}. Therefore the best polynomial approximation is α i u i (x) where α i = sin x, u i (x) = π π sin x u i (x) dx. The approximation to sin x given in the book on page 5 is in contrast to the Taylor Polynomial x.0229 x3 6.44035 + x3 77.207, x x3 6 + x5 20. Cauchy-Schwarz Inequality: u, v u v. Proof: Project u onto v yielding p = λv. We have u p p, therefore u 2 = u p + p 2 = u p 2 + p 2 p 2 = λ 2 v 2. Given that λ = u, v /v v, this yields u 2 v 4 u, v 2 v 2, and this implies Cauchy-Schwartz. Triangle Inequality: u + v u + v. 9

Proof: Square both sides and subtract the left-hand side from the right-hand side. The result is 2 u v u, v v, u = 2 u v 2Re u, v 2 u v 2 u, v 0 by Cauchy-Schwarz. The Orthogonal Complement of a Subspace: Let V be a finite-dimensional inner-product space and let U be a subspace. We define U = {v V : v, u = 0 for all u U}. We can construct U explicitly as follows: Let {u,..., u k } be an orthonormal basis for U. Expand to an orthonormal basis {u,..., u n } for V using Gram-Schmidt. The vectors in span(u k+,..., u n ) are orthogonal to the vectors in U. Moreover, for any v U, the coefficients of v in terms of the orthonormal basis are the inner product of v with each basis vector, which places v span(u k+,..., u n ). Therefore U = span(u k+,..., u n ). This immediately implies that (U ) = span(u,..., u k ) = U. Note also that V = U U. To decompose a vector in V into something in U plus something in U we can use v = P v + (v P v). Chapter 7: Determinants Prove directly that the 2 2 determinant has the following properties: det(i) =, that as a function of the columns, det(a, A 2 ) is multilinear and skewsymmetric. In particular, the determinant of a matrix with repeated columns is 0. Moreover, det(ab) = det(a) det(b). Proof of the last statement: The fact that the determinant is skew-symmetric implies that the determinant is zero when there is a repeated column. AB = C has columns C = b A + b 2 A 2 and C 2 = b 2 A + b 22 A 2, therefore det(ab) = det(b A + b 2 A 2, b 2 A + b 22 A 2 ) = b b 2 det(a, A 2 ) b 2 b 2 det(a, A 2 ) = det(b) det(a). Define the n n determinant recursively and state that it is also has the same properties as above. Theorem: When the columns of a matrix are linearly dependent, the determinant is 0. 20

Proof: Expand one column in terms of the others, compute the determinant, and note that all terms are zero. Theorem: When the columns of a matrix are linearly independent, the determinant is not 0. Proof: The linear map defined by the matrix is invertible, so the map has an inverse, so the matrix has an inverse. The determinant of the product is, so each determinant is non-zero. Cramer s Rule: Suppose Ax = b. Then b = x A + + x n A n. The determinant of the matrix whose columns are A,..., b,..., A n, where the replacement is in column i, is x i det(a). So x i is this determinant divided by det(a). Chapter 8: Eigenvalues and Eigenvectors Let A be an n n matrix of real numbers. We way that a non-zero vector v is an eigenvector of A if there is a number λ such that Av = λv. How to find: In matrix terms, we are solving Av = λiv = 0, (A λi)v = 0. This says that the columns of A λi are linearly dependent, which implies that det(a λi) = 0. Expand this in terms of the unknown λ, then solve for λ, then go back and calculate v. Using the dot product as scalar product, the dot product of two column vectors x and y is x T y. Real symmetric matrices have real eigenvalues. Proof: Let Av = λv where λ C, v C n, and v 0. Write λ = a + bi and v = x + iy. Comparing real and complex values in A(x + iy) = (a + bi)(x + iy) we obtain and Therefore and Ax = ax by Ay = ay + bx. x T Ay = ax T y + bx T x x T Ay = (Ax) T y = (ax T by T )y = ax T y by T y. 2

Comparing, b x 2 = b y 2. If b 0 then x = y = 0 since they have zero length. This contradicts v 0. Hence b = 0 and λ is real. Let A be a square matrix with orthonormal columns. Then A T = A. Proof: multiply and look at the dot products. For any two matrices A and B, (AB) T = B T A T. Let A and B be square matrices with orthonormal columns. Then AB has orthonormal columns. Proof: If B,..., B n are the columns of B then AB,..., AB n are the columns of AB. Dot product of columns i and j is (AB i ) T (AB j ) = Bi T A T AB j = Bi T B j = δ ij. Let A be a real symmetric matrix. Then there is an orthonormal matrix C such that C T AC is diagonal with eigenvalue diagonal entries. Proof: By induction on number of rows and columns. Trivial when n =. More generally, let v be eigenvector with eigenvalue λ. Find basis incorporating v and use Gram-Schmidt to produce orthonormal basis v,..., v n. Then Av = λ v. Let C = [ ] v v 2 v n. Then AC = [ Av Av 2 Av n ] = [ λ v Av 2 Av n ], C T AC = [ λ C T v C T Av 2 C T Av n ] = matrix with first row λ, b 2,..., b n, first column λ, 0,..., 0, and lower right hand submatrix B. See notes. Corollary: C T AC = diag(λ,..., λ n ) implies AC = (λ C, λ 2 C 2,..., λ n C n ). In other words, the columns of C are an orthonormal set of eigenvectors. Finding them: First note that eigenvectors of a real symmetric matrix A corresponding to distinct eigenvalues are orthogonal: Suppose A T = A and Au = αu and Av = βv where α β. Then βu T v = u T (βv) = u T (Av) = (u T A)v = (u T A T )v = (Au) T v = (αu) T v = α(u T v). 22

Since α β, this forces u T v = 0. In other words, their dot product is 0. Find each eigenvalue using the characteristic polynomial, then find a basis for each eigenspace, then use Gram-Schmidt to find an orthonormal basis for each eigenspace. The union of the bases will be orthonormal and there will be enough of them to form an orthonormal basis. These are the columns of C. Applications: () Matrix powers and solutions to recurrence relations (2) Diagonalizing a quadratic binary form (3) Solution to system of differential equations. Example: Graph ax 2 + bxy + cy 2 =. In matrix form, this reads [ ] [ ] [ ] a b/2 x x y = [ ]. b/2 c y We have already proved that for each symmetric matrix A there is a rotation matrix C of eigenvectors of A such that A = CDC T where D is a diagonal matrix. Making the substitution we obtain [ ] [ ] [ ] λ 0 x x y C C T = [ ]. 0 λ 2 y Writing we obtain In other words, [ ] [ ] X x = C T Y y [ ] [ ] [ ] λ X Y 0 X = [ ]. 0 λ 2 Y λ X 2 + λ 2 Y 2 =. This is much easier to graph. Since [ ] x = C y [ ] X Y and C is a rotation matrix, all we have to do is identify the angle of rotation θ and rotate the XY graph by θ to obtain the xy graph. 23

[ ] 3/2 Example: Graph x 2 + 3xy + 5y 2 =. The eigenvalues of are [ ] [ ] 3/2 5 3 λ = and λ 2 2 =. Eigenspace bases are { } and { }. This yields 2 3 [ ] 3/ 0 / 0 C = / 0 3. 0 Identifying this with [ ] cos(θ) sin(θ) R(θ) = sin(θ) cos(θ) yields cos θ = 3 0, sin θ = 0, tan θ =, θ = tan = 0.3275 3 3 radians or 8.4349 degrees. So we graph 2 X2 + Y 2 =, then rotate 2 8.4349 degrees. For example, one solution to 2 X2 + Y 2 = is X = 2, 2 Y = 0. This yields solution We have [ x y ] = C [ X Y ] = [ 3/ 0 / 0 / 0 3 0 ] [ 2 0 ] [ = 3 5 5 (3/ 5) 2 + 3(3/ 5)( / 5) + 5( / 5) 2 =. ]. Graph of 2 X2 + 2 Y 2 = : 0.4 0.2.5.0 0.5 0.5.0.5 0.2 0.4 Graph of x 2 + 3xy + 5y 2 = : 0.4 0.2.5.0 0.5 0.5.0.5 0.2 0.4 24

A related problem: find the maximum and minimum value of x 2 + 3xy + 5y 2 subject to x 2 + y 2 =. Given that (x, y) is related to (X, Y ) by a rotation, x 2 + y 2 = is equivalent to X 2 + Y 2 =. So equivalently we can find the maximum of 2 X2 + Y 2 subject to X 2 + Y 2 =. Writing Y 2 = X 2 2 we want the maximum of 2 5X2 where X. The maximum is 2 using (X, Y ) = (0, ), (x, y) = (/ 0, 3/ 0). The minimum is using 2 (X, Y ) = (, 0), (x, y) = (3/ 0, / 0). 25