Typical Problem: Compute. - PDF Free Download

Math 2040 Chapter 6 Orhtogonality and Least Squares 6.1 and some of 6.7: Inner Product, Length and Orthogonality. Definition: If x, y R n, then x y = x 1 y 1 +... + x n y n is the dot product of x and y. [ ] [ ] 2 5 Typical Problem: Compute. 4 3 Important: The dot product of x and y is a scalar, NOT a vector. Remark: The dot product is mysterious and beautiful and powerful. The matrix product form of dot product: When x is viewed as a n 1 matrix, we get x y = x T y. Theorem 1 page 376: The dot product is a commutative, bilinear, positive definite operation. That is, 1. (Commutativity) x y = y x, 2. (Bilinearity) (αx + βy) z = α(x z) + β(y z) and x (αy + βz) = α(x y) + β(x z), and 3. (Positive definiteness) x x > 0 if x 0. Remark: x x = x 2 1 +... + x 2 n, so, always, x 0. T Also, x x = 0 implies x 2 1 +... + x 2 n = 0, in which case, x = 0. Thus, the positive definite property is equivalent to the property that always x x 0, and x x = 0 iff x = 0. Finally, it is worth noting that, in the presence of symmetry, the bilinearity property is equivalent to the two properties (Distributivity of dot product over addition) x (y + z) = x y + x z, and (Scalars float) (αx) y = α(x y) = x (αy). Abstraction: The abstraction of the above properties yields the idea of an inner product on a vector space. Thus, we define an inner product space to be a vector space V together with the assignment to each u, v V of a scalar < u, v >, called the inner product of u and v, such that for all u, v, w V and α, β R, the following conditions hold. 1. (Commutativity) < u, v >=< v, u >, 2. (Bilinearity) < αu + βv, w >= α < u, w > +β < v, w > and < u, αv + βw >= α < u, v > +β < u, w >, and 3. (Positive definiteness) < u, u > is positive if u 0. Remark: The positive definiteness property is equivalent to the condition that < u, u > is nonnegative, and < u, u >= 0 iff u = 0. In the presence of inner product symmetry, the bilinearity property of inner product is equivalent to the two properties 1

(Distributivity of inner product over addition) < u, v + w >=< u, v > + < u, w >, and (Scalars float) < αu, v >= α < u, v >=< u, αv >. Typical Problems: Establish each of the following. 1. R n together with the dot product is an example of an inner product space. 2. Any n th degree polynomial p(t) is completely determined by its values p(t 0 ), p(t 1 )..., p(t n ) at n + 1 distinct points t 0, t 1,..., t n in R. Given this, it is an easy matter to show that < p(t), q(t) >= p(t 0 )q(t 0 ) + p(t 1 )q(t 1 ) +... + p(t n )q(t n ) is an inner product on P n whenever t 0, t 1,..., t n are n + 1 distinct points in R. 3. Recall C[a, b] denotes the vector space of real valued continuous functions defined on the interval [a, b] = {x a x b} in R. C[a, b] becomes an inner product space when the inner product is given by < f, g >= b f(x)g(x)dx for each f, g C[a, b]. a Convention: Throughout what follows, unless otherwise specified, R n is considered to be an inner product space with the inner product equal to the dot product. Distance: The (Euclidean) distance from x to y in R n is the nonnegative square root of (x 1 y 1 ) 2 +... + (x n y n ) 2 (i.e. + (x 1 y 1 ) 2 +... + (x n y n ) 2 ). [ ] [ ] 2 5 Typical Problem: With x = and y =, find the distance from x to y in R 4 3 2. Norm: The norm or length of x R n is the distance from x to 0. Thus, the norm of x R n is x 2 1 + x 2 2 +... + x 2 n. [ 2 Typical Problem: With x = 4 ] in R 2, find the length (or norm) of x. Proposition: The distance from x to y in R n is (x y) (x y) and the norm of x is x x. Notation: The distance from x to y is denoted x y. The norm of x is denoted x. Unit Vectors: A vector x is a unit vector if it has length 1. If x 0, then x/ x is a unit vector in the direction of x. [ ] 2 Typical Problem: With x = in R 4 2, find a unit vector in the direction of x. Abstraction: The distance in an inner product space V from u to v in V is defined by u v = and the norm of u V is defined to be u = < u, u >. Proposition: For any vectors u, v, and w in an inner product space V, each of the following properties hold. 2

1. (Symmetry) The distance from u to v is the distance from v to u; that is, u v = v u. 2. (Degeneracy) The distance from u to v is zero if and only if u = v; that is, u v = 0 iff u = v. 3. (Triangle Inequality) The distance from u to w is less than or equal to the distance from u to v plus the distance from v to w; that is, u w u v + v w. Remark. The properties above allow further abstraction to more general spaces and giving us the important ideas of metrics, metric spaces, normed spaces, normed linear spaces, Banach spaces, etc. Typical Problems: On P 2, consider the inner product < p(t), q(t) >= p( 1)q( 1) + p(0)q(0) + p(1)q(1) for each p(t), q(t) P 2. Let p(t) = 1 + t 2 and q(t) = 2 2t + t 2. 1. Compute < p(t), q(t) >. 2. What is the length of p(t)? 3. What is the distance from p(t) to q(t)? 4. Find a unit vector in the direction of p(t). Proposition: Let u and v be vectors in an inner product space V. Then 1. u v 2 =< (u v), (u v) >, 2. u = u 0 (i.e., the norm of u is its distance from the origin). 3. u 2 =< u, u >. 4. u v 2 = u 2 2 < u, v > + v 2. 5. u + v 2 = u 2 + 2 < u, v > + v 2. Pythagoras Theorem: < u, v >= 0 if and only if u + v 2 = u 2 + v 2 in any inner product space V. Proof: Examine item 5 in the last proposition above. Definition: In an inner product space, vectors u and v are called orthogonal or perpendicular vectors if and only if < u, v >= 0. When u and v are orthogonal, we write u v. Typical problems: 1. Determine if 1 1 1 1 1 2. What about 1 1 1 1 1 2 2. On P 2, consider the inner product < p(t), q(t) >= p( 1)q( 1) + p(0)q(0) + p(1)q(1) for each p(t), q(t) P 2. Let p(t) = 1 + t 2 and q(t) = 2 2t + t 2. Determine if p(t) q(t). 3. Prove 0 u for any vector u in an inner product space V.? 3

Proposition: If W is a subspace of an inner product space V then {u u w for all w W } is also a subspace of V. Definition: If W is a subspace of an inner product space V then {u u w for all w W } is denoted by W (spoken W perp), and called the orthogonal complement of W. Proposition: If W is a subspace of an inner product space V, then W W = {0}. Proposition: If W = Span{w 1,..., w k } in an inner product space V, then u W if and only if u w i, for all i = 1,..., k. Theorem 3 page 381 Generalized: Let A be an m n matrix. Then (1) Col(A T ) and Nul(A) are each other s orthogonal complement in R n, and (2) Col(A) and Nul(A T ) are each other s orthogonal complement in R m. Proof: See the Appendix below. Sections 6.2 through 6.4 via 6.7: Orthogonality and Gram-Schmidt. Definition. A set of vectors which are pair-wise orthogonal is called an orthogonal set. An orthogonal set which is also a basis of an inner product set is called an orthogonal basis. If all of the vectors in an orthogonal set are unit vectors then the set is called orthonormal. Warning: A matrix whose columns form an orthonormal set is called an orthogonal matrix, NOT an orthonormal matrix. There is no such thing as an orthonormal matrix. This is a historical accident. Also, it is more usual to speak of n n orthogonal matrices than it is of n m ones. Linear Independence Of Orthogonal Sets: If {b 1,..., b p } is an orthogonal set of nonzero vectors in an inner product space V, then it is linearly independent. This is because 0 = α 1 b 1 +... + α i b i +... + α n b n implies = (since 0 = α 1 b 1 +... + α i b i +... + α n b n ) = α 1 +... + α i +... + α n (by bilinearity) = α i (since = 0 for all i j and 0), so α i = = 0 (since = 0) for all i. Representations Using Orthogonal Basis: An orthogonal basis provides a useful representation of an inner product space because coordinates are easily computed. To see this let B = {b 1,..., b n } be an orthogonal basis of an inner product space V and let x V. Generally, [x] B = P 1 B (x) is difficult to compute, but not in this case. Write x = α 1b 1 +...+α i b i +...+α n b n. 4

Then, as just above (but using x instead of 0), = α i , so α i = ... Therefore, [x] B = P 1 B (x) = .... Testing Orthogonality In R n : In R n, {v 1,..., v p } is an orthogonal set if and only if A T A is a diagonal matrix, where A = [v 1... v p ], because if D = A T A then d i,j = v T i v j = v i v j, each i and j. Also, in R n, {v 1,..., v p } is an orthonormal set if and only if A T A = I p. It is worth noting here, for the sake of reducing the number of computations, that regardless of whether D = A T A is diagonal or not, it is symmetric (that is, D T = D), so to determine if D is diagonal, one only needs to check the entries strictly below its diagonal (if any are non zero then orthogonality fails). For orthonormality, one must check the unity of the diagonal elements as well. Finally, after repeating the warning that there is no such thing as an orthonormal matrix, we notice that an n p matrix A is orthogonal iff A T A = I p, in which case, we see that an n n matrix A is orthogonal iff A T = A 1. Thus, {v 1,..., v n } is an orthonormal basis of R n iff A = [v 1... v n ] is an orthogonal matrix. Typical problems: 1. Show the standard basis {e 1,..., e n } of R n is orthonormal. 2. Is {[1, 2, 2, 3] T, [2, 1, 6, 4] T } an orthogonal set in R 4? Orthonormal? 3. Is {[1, 2, 2, 3] T, [2, 1, 6, 4] T, [ 2, 3, 1, 2] T } an orthogonal set in R 4? 4. Let P 1 have inner product defined by evaluation at t 1 = 1 and t 2 = 1 (that is, < p(t), q(t) >= p( 1)q( 1) + p(1)q(1)). Is B = {1 + t, 1 t} orthogonal? Orthogonal basis? Orthonormal basis? How about S = {1, t}? 5. Repeat the last item but with t 1 = 0 and t 2 = 1. 6. Compute [2 + 3t] B for B = {1 + t, 1 t} in P 1 with inner product given by evaluation at t 1 = 1 and t 2 = 1. The Orthogonal Projection Theorem: Let B = {b 1,..., b p } be an orthogonal set of nonzero vectors in an inner product space V. Then for any y V, there is a unique vector ŷ = <y,b i> b i +... + <y,b p> b p in W = Span(B) (called the orthogonal projection of y onto W ) and unique vector such that y = ŷ + z. z = y ŷ in W 5

Proof: Obviously, the conclusion y = ŷ + z follows from the equation z = y ŷ defining z. The existence and uniqueness of z are guaranteed by the existence and uniqueness of ŷ, respectively, because z = y ŷ. ŷ exists because it is given by a computable formula. That y W is obvious. The uniqueness property of ŷ and that that z W will established in class. Typical problems: 1. Find the orthogonal projection of [2, 0, 1, 2] T onto the subspace of R 4 spanned by {[1, 2, 2, 3] T, [2, 1, 6, 4] T }. 2. Use an orthogonal projection to extend the orthogonal set {[1, 2, 2] T, [0, 1, 1] T } to an orthogonal basis of R 3. 3. Let P 1 have inner product defined by evaluation at t 1 = 1 and t 2 = 1. Find the orthogonal projection of 2 + 3t onto Span{p(t)} in P 1 where p(t) = 1 t. 4. Repeat the last item except use t 1 = 2 and t 2 = 1. Lecture Target: This is the optimal spot to end the lecture of Thursday, August 3 rd. Cauchy-Schwarz Inequality: For any vectors u and v in an inner product space, < u, v > u, v. The elegant but tricky proof of this inequality is given in the textbook on page 432. This inequality is the key to the metric triangle property mentioned above (also, see page 433 of text). Alternative Notation: The orthogonal projection ŷ of y onto the subspace W is written on occasion P roj W y; that is, for ŷ = P roj W y. The Best Approximation Theorem: For any subspace W = Span(B) and y V in a finite dimensional inner product space, P roj W y is the best approximation to y by a vector in W ; that is, y P roj W y y v for all v W. Proof: To be done in class. Typical Problems: 1. Find the best approximation to [2, 0, 1, 2] T in W = Span{[1, 2, 2, 3] T, [2, 1, 6, 4] T }. 2. Let P 1 have inner product defined by evaluation at t 1 = 1 and t 2 = 1. Find the best approximation to 2 + 3t in Span{p(t)} in P 1 where p(t) = 1 t. The Gram-Schmidt Process: Let {b 1,..., b p } be a basis of a subspace W in an inner product space V. Let v 1 = b 1 and for 1 < i p let v i = b i P roj W i b i where W i = Span({v 1,..., v i 1 }). Then {v 1,..., v p } is an orthogonal basis of W. Upon normalizing each v i, we get an orthonormal basis { v 1,..., v p } of W. v 1 v p Typical problems: 1. With B = {[0, 1, 1, 1] T, [1, 1, 1, 0] T, [1, 0, 1, 1] T }, find the orthogonal projection of [0, 1, 1, 0] T onto Span(B) in R 4. 6

2. Let P 2 have inner product defined by evaluation at t 1 = 1, t 2 = 0 and t 3 = 1. Find an orthogonormal basis for P 2. The QR Factorization: Let A be an m n matrix with linearly independent columns. Let Q be the matrix of columns formed by applying the Gram-Schmidt process to the columns of A and normalizing each of the resulting columns. Then Q is an orthogonal matrix. Let R = Q T A. Since Q is orthogonal, we have A = QR. Moreover, R is n n, upper triangular, invertible and has positive entries on its diagonal. Typical Problem: Find the QR factorization of A = 1 1 2 1 1 1 6.5 Least Squares: If A is an m n matrix and b R m then a least squares solution of Ax = b is a vector ˆx R n such that b Aˆx b Ax for all x R n. Clearly, the set of least squares solutions of Ax = b is the set of solutions of Aˆx = ˆb where ˆb = P rojcol(a) b. b ˆb = b Aˆx so b Aˆx lies in the orthogonal complement of Col(A). This orthogonal complement is Nul(A T ), so A T (b Aˆx) = 0. Thus, we seek x such that A T Aˆx = A T b. The system of equations A T Aˆx = A T b is called the system of normal equations for x in the least squares problem Ax = b. When A has linearly independent columns, A T A is invertible ˆx = (A T A) 1 A T b. From this we get a matrix product description of orthogonal projection, for ˆb = Aˆx = A(A T A) 1 A T b. The invertibility of A T A mentioned above is not obvious. That A T A is invertible when A has linearly independent columns proceeds by noticing Col(A T ) = R n (since Nul(A) = {0}, and since, as is shown in the appendix, Col(A T ) is the orthogonal complement of Nul(A)). Therefore, A T is onto. It will follow from this that A T A is also onto. To see this let x R n. Since A T is onto, there is y R m, such that A T y = x. Let ŷ be the orthogonal projection of y onto Col(A) in R m and let z = y ŷ. z Nul(A T ) because, again by the appendix below, Nul(A T ) is the orthogonal complement of Col(A) in R m. Therefore A T z = 0 and y = z + ŷ. Now, since ŷ Col(A), there is a vector v R n such that Av = ŷ. We have. 7

A T Av = A T ŷ = 0 + A T ŷ = A T z + A T ŷ = A T (z + ŷ) = A T y = x. Therefore, as claimed, A T A is an onto linear transformation. But A T A is an n n matrix, so, by the Invertible Matrix Theorem, A T A is invertible (because it is onto). This argument provides an inverse statement to the one given in the assigned exercise 6.5.20. Typical Problem: Let A have columns a 1 = [0, 1, 1, 1] T, a 2 = [1, 1, 1, 0] T, and a 3 = [1, 0, 1, 1] T. Use least squares to find the orthogonal projection of [0, 1, 1, 0] T onto Col(A) in R 4. 6.6 Least Squares Lines: In R n let 1 = [1, 1,..., 1] T be the column vector all of whose coordinates are 1. Given x, y R n the least squares line is given by β 0, β 1 R such that ŷ = β 0 1 + β 1 x where ŷ is the best approximation of y in Col([1 x]). Thus, we seek β = [β 0, β 1 ] T R 2 such that [1 x] β = ŷ. The normal equations provide the solution. [1 x] T [1 x] β = [1 x] T y Typical Problem: Find the equation y = β 0 + β 1 x of the least squares line that best fits the data points ( 1, 0), (0, 1), (1, 2), (2, 4). Appendix. Above, we generalized Theorem 3 page 381. generalization next and then we go on to prove it. Theorem 3 page 381 Generalized: Let A be an m n matrix. Then Here, we repeat the statement of the (1) Col(A T ) and Nul(A) are each other s orthogonal complement in R n, and (2) Col(A) and Nul(A T ) are each other s orthogonal complement in R m. Proof. The technical meaning of the two statements is (1) Row(A) Col(A T ) = (Nul(A)) and Nul(A) = (Col(A T )) (Row(A)), and (2) Col(A) = (Nul(A T )) and Nul(A T ) = (Col(A)). (2) follows from (1) upon replacing A, A T, m, and n throughout (1) by A T, A, n, and m, respectively. The second part of (1), that Nul(A) = (Col(A T )) (Row(A)), is easy and given in the textbook on page 381. The first part of (1), that Row(A) Col(A T ) = (Nul(A)), 8

is more difficult and requires an exploitation of the rank of A. We will prove Row(A) Col(A T ) = (Nul(A)). Row(A) Col(A T ) follows by the observation that transpose is an isomorphism sending rows to columns and columns to rows. Now we turn our attention to proving Col(A T ) = (Nul(A)) in R n. A brief meditation on the relationships between the rows of A, the columns of A T, and the solutions of Ax = 0 will allow the reader to see that each column of A T is orthogonal to any x Nul(A). Therefore, each column of A T is in (Nul(A)). Thus, Col(A T ) is a subspace of (Nul(A)), (*) because (Nul(A)) is a vector space (the vector space structure being inherited from R n ). Let r = Rank(A). We know that dim(col(a T )) = Rank(A T ) = Rank(A) = r. Let p = dim((nul(a)) ). Then r p by (*), so if we can show then we will have p r, dim(col(a T )) = r = p = dim((nul(a)) ). (**) Once p r is established, the desired result, that Col(A T ) = (Nul(A)) follows immediately from (*) and (**), because a subspace (namely, Col(A T )) with dimension equal to the dimension of its parent space (namely, (Nul(A)) ) must be all of the parent space. Let B = {v 1, v 2,...v p } be a basis of (Nul(A)). Let q = dim(nul(a)); so that, q = n r by the Rank Theorem. Let C = {w 1, w 2,...w q } be a basis of Nul(A), and let D = {v 1, v 2,...v p, w 1, w 2,...w q } be the union of B and C. We will show D is linearly independent. Why will this help? Well, if D is shown to be linearly independent, then since p + q is the number of elements in D, it follows that p + q n by the Basis Theorem, in which case, we have p + (n r) = p + q n, so p r 0; whence, p r, as required. Thus, once we have established that D is linearly independent, then the proof will be complete. We do this next. Suppose (α 1 v 1 +... + α p v p ) + (β 1 w 1 +... + β q w q ) = 0 for some scalars α 1,..., α p, β 1,..., β q, where the bracketing has no use other than to suggest what follows next. Let x = α 1 v 1 +... + α p v p and y = β 1 w 1 +... + β q w q. We have x + y = 0, so x = y. Now, x (Nul(A)) and y Nul(A). But y Nul(A) and x = y implies x Nul(A). Therefore, x (Nul(A)) Nul(A). But, far above we noted this means x = 0. Also, y = 0, because x = y and x = 0. Now we have it, 9

because x = 0 and y = 0 give us 0 = α 1 v 1 +... + α p v p and 0 = β 1 w 1 +... + β q w q ; whence, all of the α-scalars and β-scalars are 0 by the linear independence of the bases B and C, respectively. Therefore, D is linearly independent as claimed, and this completes the proof. 10