Dot product and linear least squares problems

Dot product and linear least squares problems Dot Product For vectors u,v R n we define the dot product Note that we can also write this as u v = u,,u n u v = u v + + u n v n v v n = u v + + u n v n The dot product u u = u + + u n gives the square of the Euclidean length of the vector, aka norm of the vector: u = (u u) / = ( u + + u ) / n Theorem (Cauchy-Schwarz inequality) For a,b R n we have Proof Step for vectors with a = and b = : Then a b a b () (b a) (b a) = b b a b + a a a b a a + b b = + Hence a b Using (b + a) instead of (b a) gives a b, ie, a b Step for general vectors a, b: If a = or b = we see that () obviously holds If both vectors are different from let u := a/ a and v := b/ b, then u v = Since u = and v = we get from step that u v a b a b Consider a triangle with the three points, a, b Then the vector from a to b is given by c = b a, and the lengths of the three sides of the triangle are a, b, c Let θ denote the angle between the vectors a and b Then the law of cosines tells us that Multiplying out c = (b a) (b a) gives By comparing the last two equations we obtain If a and b are different from we have c = a + b a b cosθ c = a + b a b a b = a b cosθ cosθ = This tells us how to compute the angle θ between two vectors: q := a b a b, a b a b θ := cos q Because of () we have q, hence the inverse cosine function gives θ π: q = θ =, q = θ = π, q = θ = π, ie, a,b point in the same direction ie, a,b are orthogonal ie a, b point in opposite directions

Example Find the angle between the vectors a = q := and b = : We get a b a b = =, θ := cos q = cos = π b x 5 a 5 5 x x We say vectors a,b R n are orthogonal or a b a b a b = We say a vector a R n is orthogonal on a subspace V of R n or a V a V a v = for all v V Orthogonal projection onto a line Consider a -dimensional subspace V = span{v} of R n given by a vector v R n This is a line through the origin For a given point b R n we want to find the point u V which is closest to the point b: The point u must have the following properties: u V, ie, u = cv with some unknown c R u b V, ie, v (u b) = Find u V such that u b is minimal By plugging u = cv into the second property we get by multiplying out v (cv b) = c(v v) (v b) = c = v b v v Therefore the point u on the line given by v which is closest to the point b is given by u = v b v v v We say that u is the orthogonal projection of the point b onto the line given by v and use the notation pr v b = v b v v v

Orthogonal projection onto a -dimensional subspace V Consider a -dimensional subspace V = span { v (),v ()} of R n given by two linearly independent vectors v (),v () R n This is a plane through the origin For a given point b R n we want to find the point u V which is closest to the point b: The point u must have the following properties: Find u V such that u b is minimal u V, ie, u = c v () + c v () with some unknowns c,c R u b V, ie, v () (u b) = and v () (u b) = By plugging u = c v () +c v () into the second property we obtain a linear system of two equations for two unknowns which we can then solve We can use the k matrix A = v (),v () to express the two properties: c u V, ie, u = Ac with an unknown vector c = R v u b V, ie, v () (u b) = and v () () (u b) =, ie, By plugging u = Ac into the second property we obtain c v () A (u b) = A (Ac b) = A Ac = A b (u b) = These are the so-called normal equations (since they express that u b is orthogonal or normal on the subspace V This is how to find the point u V which is closest to b: find the matrix M := A A R and the vector g := A b R solve the linear system Mc = g for c R let u := Ac Example Consider the plane V = span Let A = Then Solving the linear system 6 c Hence the closest point is u = Ac = c, M = A 6 A = 5 = checking v () r = and v () r = : We have r = Ac b = v () r = or Find the point u V which is closest to b = gives c = 5, g = We can check that this is correct by finding the difference vector r = u b and =, v () r = =

Orthogonal projection onto a k-dimensional subspace V aka least squares problem Consider a k-dimensional subspace V = span { v (),,v (k)} of R n given by k linearly independent vectors v (),,v (k) R n Ie, the vectors v (),,v (k) form a basis for the subspace V For a given point b R n we want to find the point u V which is closest to the point b: Let us define the n k matrix A = v (),,v (k) The point u must have the following properties: Find u V such that u b is minimal () u V, ie, u = c v () + + c k v (k) = Ac with some unknown vector c R k v () u b V, ie, v () (u b) =,,v (k) (u b) =, ie, (u b) = By plugging u = Ac into the second property we obtain v (k) A (u b) = or A (Ac b) = () A Ac = A b () These are the so-called normal equations (since they express that u b is orthogonal or normal on the subspace V ) This is how to find the point u V which is closest to b:

find the matrix M := A A R k k and the vector g := A b R k solve the k k linear system Mc = g for c R k let u := Ac Note that the normal equations () always have a unique solution: Theorem Assume the matrix A R n k with k n has linearly independent columns (ie, ranka = k) Then the matrix M = A A is nonsingular Proof We have to show that Mc = implies c = Assume we have c R k such that Mc = Then we can multiply from the left with c and get with y := Ac = c Mc = c A Ac = y y = y as y = c A Since y = we have y = Ac = This means that we have a linear combination of the columns of A which gives the zero vector Since by assumption the columns of A are linearly independent we must have c = So solving the normal equations gives us a unique c R n We then get by u = Ac a point on the subspace V We now want to formally prove that this point u V is really the unique answer to our minimization problem () Theorem Assume the matrix A R n k with k n has linearly independent columns (ie, ranka = k) Then for any given b R n the minimization problem find c R k such that Ac b is minimal (5) has a unique solution which is obtained by solving the normal equations A Ac = A b Proof Let c R k be the unique solution of the normal equations Consider now c = c + d where d R k is nonzero We then have ( ) ( ) A c b = Ac b + Ad = (Ac b) + Ad (Ac b) + Ad = (Ac b) (Ac b) + (Ad) (Ac b) + (Ad) (Ad) We have for the middle term (Ad) (Ac b) = (Ad) (Ac b) = d A (Ac b) = by the normal equations () Hence A c b = Ac b + Ad Since d we have Ad since the columns of A are linearly independent, and hence Ad > This means that for any vector c different from c we get A c b > Ac b, ie, the vector c from the normal equations is the unique solution of (5) Least squares problem with orthogonal basis For a least squares problem we are given n linearly independent vectors a (),,a (n) R m which form a basis for the subspace V = span{a (),,a (n) } For a given right hand side vector b R m we want to find u V such that u b is minimal We can write u = c a () + + c n a (n) = Ac with the matrix A = a (),,a (n) R m n Hence we want to find c R n such that Ac b is minimal Solving this problem is much simpler if we have an orthogonal basis for the subspace V : Assume we have vectors p (),, p (n) such that span { p (),, p (n)} = V the vectors are orthogonal on each other: p (i) p ( j) = for i j

We can then write u = d p () + + d n p (n) = Pd with the matrix P = p (),, p (n) R m n Hence we want to find d R n such that Pd b is minimal The normal equations for this problem give where the matrix P P = p () p (n) (P P)d = P b (6) p () p () p () p (n) p (),, p (n) = = p (n) p () p (n) p (n) p () p () p (n) p (n) is now diagonal since p (i) p ( j) = for i j Therefore the normal equations (6) are actually decoupled ( p () p ()) d = p () b ( p (n) p (n)) d n = p (n) b and have the solution d i = p(i) b p (i) p (i) for i =,,n Gram-Schmidt orthogonalization We still need a method to construct from a given basis a (),,a (n) an orthogonal basis p (),, p (n) Given n linearly independent vectors a (),,a (n) R m we want to find vectors p (),, p (n) such that span { p (),, p (n)} = span { a (),,a (n)} the vectors are orthogonal on each other: p (i) p ( j) = for i j Step : p () := a () Step : p () := a () s p () where we choose s such that p () p () = : p () a () s p () p () = s = p() a () p () p () Step : p () := a () s p () s p () where we choose s, s such that p () p () =, ie, p () a () s p () p () s p () p () =, hence s }{{} = p() a () p () p () p () p () =, ie, p () a () s p () p () s }{{} p () p () =, hence s = p() a () p () p () Step n: p (n) := a (n) s n p () s n,n p (n ) where we choose s n,,s n,n such that p ( j) p (n) = for j =,,n which yields s jn = p( j) p (n) p ( j) p ( j) for j =,,n

Example: We are given the vectors a () =, a() =, a() = find an orthogonal basis p (), p (), p () for the subspace V = span { a (),a (),a ()} Step : p () := a () = Step : p () := a () p() a () p () p () p() = 6 Step : p () := a () p() a () p () p () p() p() a () p () p () p() = Note that we have = a () = p () 9 9 5 Use Gram-Schmidt orthogonalization to 5 = which we can write as In the general case we have a (),a (),a () = 9 a () = p () + 6 p() a () = p () + p() + 5 5 p() } {{ } A = p (), p (), p () 5 5 5 5 } {{ } P 6 5 5 5 5 } {{ } S a () = p () a () = p () + s p () a () = p () + s p () + s p () a (n) = p (n) + s n p () + + s n,n p (n ) which we can write as a (),a (),,a (n) = p (), p (),, p (n) s s n sn,n Therefore we obtain a decomposition A = PS where

P R m n has orthogonal columns S R n n is upper triangular, with on the diagonal Note that the vectors p (),, p (n) are different from : Assume, eg, that p () = a () s p () s p () =, then a () = s p () + s p () is in span { p (), p ()} = span { a (),a ()} This is a contradiction to the assumption that a (),a (),a () are linearly independent Solving the least squares problem Ac b = min using orthogonalization We are given A R m n with linearly independent columns, b R n We want to find c R n such that Ac b = min From the Gram-Schmidt method we get A = PS, hence we want to find c such that P }{{} Sc b = min d This gives the following method for solving the least squares problem: use Gram-Schmidt to find decomposition A = PS solve Pd b = min: d i := solve Sc = d by back substitution p(i) b p (i) for i =,,n p (i) Example: Solve the least squares problem Ac b = min for A = Gram-Schmidt gives A = 5 5 5 5 } {{ } P 5 5 } {{ } S 9 (see above), b = d = p() b p () p () = =, d = p() b p () = p () 5 =, d = p() b p () p () = = 5 5 5 c solving c = by back substitution gives c = 5, c = 9, c = c 5 Hence the solution of our least squares problem is the vector c = Note: If you want to solve a least squares problem by hand with pencil and paper, it is usually easier to use the normal equations But for numerical computation on a computer using orthogonalization is usually more efficient and more accurate 9 5 7 Finding an orthonormal basis q (),,q (n) : the QR decomposition The Gram-Schmidt method gives an orthogonal basis p (),, p (n) for V = span { a (),,a (n)} Often it is convenient to have a so-called orthonormal basis q (),,q (n) where the basis vectors have length : Define then we have q ( j) = j) p( p ( j) for j =,,n

span { q (),,q (n)} = V { q ( j) q (k) for j = k = otherwise This means that the matrix Q = q (),,q (n) satisfies Q Q = I where I is the n n identity matrix Since p ( j) = p ( j) q ( j) we have a () = p () q () }{{} r a () = p () q () +s p () }{{}}{{} r a (n) = p (n) q (n) }{{} r nn +s n r () p p () p () p (n ) + + s n,n }{{}}{{} r n r n,n (n ) q which we can write as a (),a (),,a (n) = q (),q (),,q (n) r r r n r rn,n r nn A = QR where the n n matrix R is given by row of R row nof R = p () (row of R) p (n) (row nof R) We obtain the so-called QR decomposition A = QR where the matrix Q R m n has orthonormal columns, rangeq = rangea the matrix R R n n is upper triangular, with nonzero diagonal elements Example: In our example we have p () p () =, p () p () = 5, p () p () =, hence 5 5 q () = p() = 5 5, q() = p () = 5 5 5 5, q() = p() = 5 5 and we obtain the QR decomposition 9 = 5 5/ 5 5 5 5/ 5 5 5 5/ 5 5 5 5/ 5 5 7 5 5 5 5 5 5

In Matlab we can find the QR decomposition using Q,R=qr(A,) This works for symbolic matrices >> A = sym( ; ; 9) ; >> Q,R = qr(a,) Q = /, -(*5^(/))/, / /, -5^(/)/, -/ /, 5^(/)/, -/ /, (*5^(/))/, / R =,, 7, 5^(/), *5^(/),, and it works for numerical matrices >> A = ; ; 9 ; >> Q,R = qr(a,) Q = -5 678 5-5 6-5 -5-6 -5-5 -678 5 R = - - -7-6 -678 Note that for numerical matrices Matlab returned the basis q (), q (),q () (which is also an orthonormal basis) and hence rows and of the matrix R is ( ) times our previous matrix R If we want to find an orthonormal basis for rangea and an orthonormal basis for the orthogonal complement (rangea) = nulla we can use the command Qh,Rh=qr(A) : It returns matrices ˆQ R m m and ˆR R m n with basis for rangea basis for (rangea) {}}{ R {}}{ ˆQ = q (),,q (n), q (n+),,q (m), ˆR = m nrows of zeros >> A = ; ; 9 ; >> Qh,Rh = qr(a) Qh = -5 678 5 6-5 6-5 -678-5 -6-5 678-5 -678 5-6 Rh = - - -7-6 -678 But in most cases we only need an orthonormal basis for rangea and we should use Q,R=qr(A,) (which Matlab calls the economy size decomposition)

Solving the least squares problem Ac b = min using the QR decomposition If we use an orthonormal basis q (),,q (n) for span{a (),,a (n) } we have Q Q = I The solution of Qd b = min is therefore given by the normal equations (Q Q)d = Q b, ie, we obtain d = Q b This gives the following method for solving the least squares problem: find the QR decomposition A = QR let d = Q b solve Rc = d by back substitution In Matlab we can do this as follows: Q,R = qr(a,); d = Q *b; c = R\d; In our example we have >> A = ; ; 9 ; b = ;;;7; >> Q,R = qr(a,); >> d = Q *b; >> c = R\d c = - 9 5 This works for both numerical and symbolic matrices For a numerical matrix A we can use the shortcut c=a\y which actually uses the QR decomposition to find the solution of Ac b = min >> A = ; ; 9 ; b = ;;;7; >> c = A\b c = - 9 5 Warning: This shortcut does not work for symbolic matrices: >> A = sym( ; ; 9) ; b = sym(;;;7); >> c = A\b Warning: The system is inconsistent Solution does not exist c = Inf Inf Inf