Chapter 6 - Orthogonality Maggie Myers Robert A. van de Geijn The University of Texas at Austin Orthogonality Fall 2009 http://z.cs.utexas.edu/wiki/pla.wiki/ 1
Orthogonal Vectors and Subspaces http://z.cs.utexas.edu/wiki/pla.wiki/ 2
Observations Let x, y R n be linearly independent. The subspace of all vectors αx + βy, α, β R (the space spanned by x and y) forms a plane. All three vectors x, y, and (x y) lie in this plane and they form a triangle: y z = y x x http://z.cs.utexas.edu/wiki/pla.wiki/ 3
Orthogonality of two vectors x and y are orthogonal/perpendicular if they meet at a right angle. The (Euclidean) length of x is given by x 2 = χ 2 0 + + χ2 n 1 = x T x. Pythagorean Theorem: The angle between x and y meet is a right angle if and only if z 2 2 = x 2 2 + y 2 2. In this case, x 2 2 + y 2 2 = z 2 2 = y x 2 2 = (y x) T (y x) = (y T x T )(y x) = (y T x T )y (y T x T )x = y T y (x T y + y T x) {z} {z } y 2 2x T y = x 2 2x T y + y 2 + x T x {z} x 2 http://z.cs.utexas.edu/wiki/pla.wiki/ 4
Soooo... Let x and y are orthogonal/perpendicular if they meet at a right angle. Then x 2 + y 2 = x 2 2x T y + y 2 Cancelling terms on the left and right of the equality, this implies that x T y = 0. Definition Two vectors x, y R n are said to be orthogonal if x T y = 0. Notation Sometimes we will use the notation x y to indicate that x is perpendicular to y. http://z.cs.utexas.edu/wiki/pla.wiki/ 5
Orthogonality of subspaces Let V, W R n be subspaces. Then V and W are said to be orthogonal if v V and w W implies that v T w = 0. In other words, two subspaces are orthogonal if all vectors in the first subspace are orthogonal to all vectors in the second subspace. Notation V W indicates that subspace V is orthogonal to subspace W. http://z.cs.utexas.edu/wiki/pla.wiki/ 6
Example Let V = Span 1 0 0, 0 1 0 and W = Span 0 0 1 Show that V W. http://z.cs.utexas.edu/wiki/pla.wiki/ 7
Definition Given subspace V R n, the set of all vectors in R n that are orthogonal to V is denoted by V (pronounced as V-perp ). Example Let V = Span 1 0 0, 0 0 1. What is V? http://z.cs.utexas.edu/wiki/pla.wiki/ 8
Exercise Let V, W R n be subspaces. Show that if V W then V W = {0}, the zero vector. Definition Whenever V W = {0} we will sometimes call this the trivial intersection. Trivial in the sense that it only contains only the zero vector. http://z.cs.utexas.edu/wiki/pla.wiki/ 9
Show that if V R n is a subspace, then V is a subspace. http://z.cs.utexas.edu/wiki/pla.wiki/ 10
Definitions (Recap) Let A R m n. Column space of A, C(A): set of all vectors in R m that can be written as Ax: C(A) = {y y = Ax}. Null space of A, N (A): the set of all vectors in R n that map to the zero vector: (New) The row space of A: N (A) = {x Ax = 0}. R(A) = {y y = A T x}. (New) The left null space of A: http://z.cs.utexas.edu/wiki/pla.wiki/ 11 (N (A T ) =){x x T A = 0}.
Exercise Show that the left null space of a matrix A R m n equals N (A T ). http://z.cs.utexas.edu/wiki/pla.wiki/ 12
Theorem Proof Let A R m n. Then the null space of A is orthogonal to the row space of A: R(A) N (A). Assume that y R(A) and x N (A). Then there exists a vector z R n such that y = A T z. (Why?) Now, y T x = (A T z) T x = (z T A)x = z T (Ax) = 0. (Why?) http://z.cs.utexas.edu/wiki/pla.wiki/ 13
Theorem The dimension of R(A) equals the dimension of C(A). Proof The dimension of the row space equals the number of linearly independent rows. which equals the number of nonzero rows that result from the Gauss-Jordan method which equals the number of pivots that show up in that method which equals the number of linearly independent columns. http://z.cs.utexas.edu/wiki/pla.wiki/ 14
Theorem Let A R m n. Then every x R n can be written as x = x r + x n where x r R(A) and x n N (A). Proof The dimension of N (A) and the dimension of C(A), r, add to the number of columns, n. (Why?) Thus, the dimension of R(A) equals r and the dimension of N (A) equals n r. If x R n cannot be written as x r + x n as indicated, then consider the set of vectors that consists of the union of a basis of R(A) and a basis of N (A), plus the vector x. This set is linearly independent and has n + 1 vectors, contradicting the fact that R n has dimension n. http://z.cs.utexas.edu/wiki/pla.wiki/ 15
Theorem Proof Let A R m n. Then A is a one-to-one, onto mapping from R(A) to C(A). Let A R m n. We need to show that A maps R(A) to C(A). This is trivial, since any vector x R m maps to C(A). Uniqueness: We need to show that if x, y R(A) and Ax = Ay then x = y. Notice that Ax = Ay implies that A(x y) = 0, which means that (x y) is both in R(A) (since it is a linear combination of x and y, both of which are in R(A)) and in N (A). Since we just showed that these two spaces are orthogonal, we conclude that (x y) = 0, the zero vector. Thus x = y. http://z.cs.utexas.edu/wiki/pla.wiki/ 16
Theorem Let A R m n. Then A is a one-to-one, onto mapping from R(A) to C(A). Proof (continued) Let A R m n. We need to show that Onto: We need to show that for any b C(A) there exists x r R(A) such that Ax r = b. Notice that if b C, then there exists x R n such that Ax = b. We know that x = x r + x n where x r R(A) and x n N (A). Then b = Ax = A(x r + x n) = Ax r + Ax n = Ax r. http://z.cs.utexas.edu/wiki/pla.wiki/ 17
Theorem Proof Let A R m n. Then the left null space of A is orthogonal to the columns space of A; and The dimension of the left null space of A equals m r, where r is the dimension of the column space of A. This follows trivially by applying the previous theorems to A T. http://z.cs.utexas.edu/wiki/pla.wiki/ 18
Summarizing... dim r row space of A R n x r x = x r + x n Ax r = b Ax = b column space of A b dim r x n R m dim n r nullspace left nullspace dim m r http://z.cs.utexas.edu/wiki/pla.wiki/ 19
Motivating Example http://z.cs.utexas.edu/wiki/pla.wiki/ 20
Example Let us consider the following set of points: (χ 0, ψ 0 ) = (1, 1.97), (χ 1, ψ 1 ) = (2, 6.97), (χ 2, ψ 2 ) = (3, 8.89), (χ 3, ψ 3 ) = (4, 10.01). Find a line that best fits these points. http://z.cs.utexas.edu/wiki/pla.wiki/ 21
14 12 10 8 6 4 2 0 0 1 2 3 4 5 6 http://z.cs.utexas.edu/wiki/pla.wiki/ 22
14 12 10 8 6 4 2 0 0 1 2 3 4 5 6 http://z.cs.utexas.edu/wiki/pla.wiki/ 23
The Problem Find a line that interpolates these points as near as is possible. Near : the sum of the squares of the distances to the line are minimized. Express this with matrices and vectors. χ 0 1 ψ 0 x = χ 1 χ 2 = 2 3 and y = ψ 1 ψ 2 = χ 3 4 ψ 3 The equation of the line we want is y = γ 0 + γ 1 x. 1.97 6.97 8.89 10.01 IF this line COULD go through all these points THEN. ψ 0 = γ 0 + γ 1 χ 1 ψ 1 = γ 0 + γ 1 χ 2 ψ 2 = γ 0 + γ 1 χ 3 ψ 3 = γ 0 + γ 1 χ 4 or, specifically, 1.97 = γ 0 + γ 1 6.97 = γ 0 + 2γ 1 8.89 = γ 0 + 3γ 1 10.01 = γ 0 + 4γ 1 http://z.cs.utexas.edu/wiki/pla.wiki/ 24
In Matrix Notation... We would like... 0 or, specifically, B @ 0 B @ ψ 0 ψ 1 ψ 2 ψ 3 1.97 6.97 8.89 10.01 1 C A = 1 0 B @ C A = 0 B @ 1 χ 0 1 χ 1 1 χ 2 1 χ 3 1 1 1 2 1 3 1 4 1 C A 1 C A γ0 γ 1 γ0 Just looking at the graph it is obvious that these point do not lie on the same line. How do we say that mathematically? γ 1 ««. Therefore all these equations cannot be simultaneously satified. Why? So, what do we do? http://z.cs.utexas.edu/wiki/pla.wiki/ 25
How does it relate to column spaces? For what right-hand sides could we have solved all four equations simultaneously? We would have had to choose y so that Ac = y, where 1 χ 0 1 1 ( ) A = 1 χ 1 1 χ 2 = 1 2 1 3 and c = γ0. γ 1 1 χ 3 1 4 This means that y must be in the column space of A! It must be possible to express it as y = γ 0 a 0 + γ 1 a 1, where A = ( a 0 a 1 )! What does this mean if we relate this back to the picture? Only if {ψ 0,, ψ 3 } have the property that {(1, ψ 0 ),, (4, ψ 3 )} lie on a line can we find coefficients γ 0 and γ 1 such that Ac = y. http://z.cs.utexas.edu/wiki/pla.wiki/ 26
What are the fundamental questions? When does Ax = b have a solution? When does Ax = b have more than one solution? How do we characterize all these solutions? If Ax = b does not have a solution, how do we find the best approximate solution? http://z.cs.utexas.edu/wiki/pla.wiki/ 27
How does this problem relate to orthogonality and projection? The problem: b does not lie in the column space of A. A question is what vector, z, that does lie in the column space so that we can solve Ac = z instead. Now, if z solves Ac = z exactly, then z = ( ) ( ) γ a 0 a 0 1 = γ γ 0 a 0 + γ 1 a 1, 1 Well DAH! z is in the column space of A. What we want is y = z + w, where w is as small (in length) as possible. This happens when w is orthogonal to z! So, y = γ 0 a 0 + γ 1 a 1 + w, with a T 0 w = at 1 w = 0. The vector z C(A) that is closest to y is known as the projection of y onto the column space of A. We need a way of finding a way to compute this projection. http://z.cs.utexas.edu/wiki/pla.wiki/ 28
Solving a Linear Least-Squares Problem http://z.cs.utexas.edu/wiki/pla.wiki/ 29
Observations The last problem motivated the following general problem: Given m equations in n unknowns, we end up with a system Ax = b where A R m n, x R n, and b R m. This system of equations may have no solutions. This happens when b is not in the column space of A. This system may have a unique solution. This happens only when r = m = n, where r is the rank of the matrix (the dimension of the column space of A). Another way of saying this is that it happens only if A is square and nonsingular (it has an inverse). This system may have many solutions. This happens when b is in the column space of A and r < n (the columns of A are linearly dependent, so that the null space of A is nontrivial). http://z.cs.utexas.edu/wiki/pla.wiki/ 30
Overdetermined systems (Approximately) solve Ax = b where b is not in the column space of A. What we want is an approximate solution ˆx such that Aˆx = z, where z is the vector in the column space of A that is closest to b. In other words, b = z + w where w T v = 0 for all v C(A). From The Figure we conclude that this means that w is in the left null space of A. So, A T w = 0. But that means that which we can rewrite as 0 = A T w = A T (b z) = A T (b Aˆx) A T Aˆx = A T b. (1) http://z.cs.utexas.edu/wiki/pla.wiki/ 31
Lemma If A R m n has linearly independent columns, then A T A is nonsingular (equivalently, has an inverse, A T Aˆx = A T b has a solution for all b, etc.). Proof Proof by contradiction. Assume that A R m n has linearly independent columns and A T A is singular. Then there exists x 0 such that A T Ax = 0. Hence, there exists y = Ax 0 such that A T y = 0. (Why?) This means y is in the left null space of A. But y is also in the column space of A, since Ax = y. Thus, y = 0, since the intersection of the column space of A and the left null space of A only contains the zero vector. This contradicts that A has linearly independent columns. http://z.cs.utexas.edu/wiki/pla.wiki/ 32
What does this mean? If A has linearly independent columns, then The desired ˆx that is the best solution to Ax = b is given by ˆx = (A T A) 1 A T b The vector z C(A) closest to b is given by z = Aˆx = A(A T A) 1 A T b. Thus z = A(A T A) 1 A T b is the vector in the columns space closest to b. The matrix A(A T A) 1 A T projects a vector onto the column space of A. http://z.cs.utexas.edu/wiki/pla.wiki/ 33
Theorem Let A R m n, b R m, and x R n and assume that A has linearly independent columns. Then the solution that minimizes min x b Ax 2 is given by ˆx = (A T A) 1 A T b. http://z.cs.utexas.edu/wiki/pla.wiki/ 34
Definition Let A R m n. If A has linearly independent columns, then (A T A) 1 A T is called the (left) pseudo inverse. Note that this means m n and (A T A) 1 A T A = I. If A has linearly independent rows, then A T (AA T ) 1 is called the right pseudo inverse. Note that this means m n and AA T (AA T ) 1 = I. http://z.cs.utexas.edu/wiki/pla.wiki/ 35
Why Least-Square? Notice that we are trying to find ˆx that minimizes min x b Ax 2. If ˆx minimizes min x b Ax 2, it also minimizes the function F (x) = b Ax 2 2 = (b Ax)T (b Ax) = b T b 2b T Ax x T A T Ax. Recall how one would find the minimum of a function f : R R, f(x) = α 2 x 2 2βαx + β 2! Take the derivative and set it to zero. Here F : R n R. Compute the gradient (essentially the derivative) and set it to zero: 0 2A T b + 2A T Ax = 0, or, A T Ax = A T b. We are looking for ˆx that solves A T Ax = A T b. http://z.cs.utexas.edu/wiki/pla.wiki/ 36