Linear Algebra. Session 12 Dr. Marco A Roque Sol 08/01/2017
Example 12.1 Find the constant function that is the least squares fit to the following data x 0 1 2 3 f(x) 1 0 1 2 Solution c = 1 c = 0 f (x) = c c = 1 c = 2 1 0 1 2 c
Then, the normal system is ( ) 1 1 1 1 1 1 1 1 c = ( 1 1 1 1 ) c = 1 4 (1 + 0 + 1 + 2) = 1 (mean arithmetic value) Thus, the constant function is f (x) = 1 1 0 1 2
Example 12.2 Find the linear polynomial function that is the least squares fit to the following data x 0 1 2 3 f(x) 1 0 1 2 Solution c 1 = 1 c f (x) = c 1 + c 2 x 1 + c 2 = 0 c 1 + 2c 2 = 1 c 1 + 3c 2 = 2 1 0 1 1 1 2 1 3 ( c1 c 2 ) = 1 0 1 2
Then, the nomal system is ( 1 1 1 1 0 1 2 3 ) ( 4 6 6 14 1 0 1 1 1 2 1 3 ) ( c1 c 2 Thus, the linear function is ( c1 ) = c 2 ) = ( 4 8 ) f (x) = 0.4 + 0.4x ( 1 1 1 1 0 1 2 3 { c1 = 0.4 c 2 = 0.4 ) 1 0 1 2
Example 12.3 Find the quadratic polynomial function that is the least squares fit to the following data x 0 1 2 3 f(x) 1 0 1 2 Solution c 1 = 1 f (x) = c 1 + c 2 x + c 3 x 2 c 1 + c 2 + c 3 = 0 c 1 + 2c 2 + 4c 3 = 1 c 1 + 3c 2 + 9c 3 = 2
1 0 0 1 1 1 1 2 4 1 3 9 Then, the nomal system is 1 1 1 1 0 1 2 3 0 1 4 9 1 0 0 1 1 1 1 2 4 1 3 9 c 1 c 2 c 3 = c 1 c 2 c 3 1 0 1 2 = 1 1 1 1 0 1 2 3 0 1 4 9 1 0 1 2
4 6 14 6 14 36 14 36 98 c 1 c 2 c 3 Thus, the quadratic function is = 4 8 22 f (x) = 0.9 1.1x + 0.5x 2 c 1 = 0.9 c 2 = 1.1 c 3 = 0.5
Orthogonal sets Let <, > denote the scalar product in R n Definition Nonzero vectors v 1, v 2,, v k R n form an orthogonal set if they are orthogonal to each other: < v i, v j >= 0 for all i j. If, in addition, all vectors are of unit length, v i, v 1, v 2,, v k is called an orthonormal set. For instance, The standard basis e 1 = (1, 0, 0,..., 0), e 2 = (0, 1, 0,..., 0),, e n = (0, 0, 0,..., 1). It is an orthonormal set.
Orthonormal bases Suppose v 1, v 2,, v n is an orthonormal basis for R n (i.e., it is a basis and an orthonormal set). Theorem Let x = x 1 v 1 + x 2 v 2 + + x n v n and y = y 1 v 1 + y 2 v 2 + + y n v n where x i, y 1 R i) < x, y >= n i=i x iy i i) x = n i=i x iy i
proof i) n n < x, y >= x i v i, y j v j = i=i j=i n x i i=i j=i n y j v i, v j = ii) follows from i) when y = x n n x i v i, v j = i=i j=i n x i y i i=i
Suppose V is a subspace of R n. Let p be the orthogonal projection of a vector x R n onto V. If V is a one-dimensional subspace spanned by a v, then p = <x,v> <v,v> v If V admits an orthogonal basis v 1, v 2,, v k, then p = < x, v 1 > < v 1, v 1 > v 1 + < x, v 2 > < v 2, v 2 > v 2 +... + < x, v k > < v k, v k > v k Indeed, < p, v i >= k <x,v j > j=i <v j,v j > < v j, v i >= <x,v i > <v i,v i > < v i, v i >=< x, v i > < x p, v i >= 0 (x p) v i (x p) V.
Coordinates relative to an orthogonal basis Theorem If v 1, v 2,, v n is an orthogonal basis for R n, then x = < x, v 1 > < v 1, v 1 > v 1 + < x, v 2 > < v 2, v 2 > v 2 +... + < x, v n > < v n, v n > v n for any vector x R n Corollary If v 1, v 2,, v n is an orthonormal basis for R n, then z =< x, v 1 > v 1 + < x, v 2 > v 2 +...+ < x, v n > v n for any vector x R n.
. Let V be a subspace of R n. Suppose x 1, x 2,, x k is a basis for V. Let v 1 = x 1 v 2 = x 2 <x 2,v 1 > <v 1,v 1 > v 1 v 3 = x 3 <x 3,v 1 > <v 1,v 1 > v 1 <x 3,v 2 > <v 2,v 2 > v 2 v k = x k <x k,v 1 > <v 1,v 1 > v 1 <x k,v 2 > <v 2,v 2 > v 2 <x k,v k 1 > <v k 1,v k 1 > v k 1 Then v 1, v 2,, v k is an orthogonal basis for V.
. Properties of the Gram-Schmidt process: Any basis x 1, x 2,, x k Orthogonal basis v 1, v 2,, v k v j = x j (α 1 x 1 α 2 x 2 α j 1 x j 1 ); 1 j k the span of v 1, v 2,, v j is the same as the span of x 1, x 2,, x j v j is orthogonal to x 1, x 2,, x j 1 v j = x j p j where p j is the orthogonal projection of the vector x j on the subspace spanned by x 1, x 2,..., x k v j is the distance from x j to the subspace spanned by x 1, x 2,..., x j 1
. Normalization Let V be a subspace of R n. Suppose v 1, v 2,, v k is an orthogonal basis for V Let w 1 = v 2 v 2, w 2 = v 2 v 2,..., w k = v k v k Then w 1, w 2,..., w k is an orthonormal basis for V. Theorem Any non-trivial subspace of R n admits an orthonormal basis.
. Example 12.4 Let Π the plane spanned by vectors x 1 = (1, 1, 0), x 2 = (0, 1, 1). i) Find the orthogonal projection of the vector y = (4, 0, 1) onto the plane Π. i) Find the distance from y to Π. Solution First we apply the Gram-Schmidt process to the basis x 1, x 2. v 1 = x 1 = (1, 0, 0)
. v 2 = x 2 <x 2,v 1 > <v 1,v 1 > v 1 = (0, 1, 1) 1 2 (1, 1, 0) = ( 1/2, 1/2, 1) v 3 = x 3 <x 3,v 1 > <v 1,v 1 > v 1 <x 3,v 2 > <v 2,v 2 > v 2 Now that v 1, v 2 is an orthogonal basis for Π, the orthogonal projection of y onto Π is p = < y, v 1 > < v 1, v 1 > v 1+ < y, v 2 > < v 2, v 2 > v 2 = 4 3 (1, 0, 0)+ ( 1/2, 1/2, 1) = (2, 2 3/2 The distance from y to Π is o = y p = (1, 1, 1) = 3
. Example 12.5 Find the distance from the point y = (0, 0, 0, 1) to the subspace V R 4 spanned by vectors x 1 = (1, 1, 1, 1), x 2 = (1, 1, 3, 1), x 3 = ( 3, 7, 1, 3),. Solution First we apply the Gram-Schmidt process to the basis x 1, x 2, x 3 and obtain an orthogonal basis v 1, v 2, v 3 for the subspace V Next we compute the orthogonal projection p of the vector y onto V p = < y, v 1 > < v 1, v 1 > v 1 + < y, v 2 > < v 2, v 2 > v 2 + < y, v 3 > < v 3, v 3 > v 3
. Then the distance from y to V equals to o = y p Alternatively, we can apply t he Gram-Schmidt process to vectors x 1, x 2, x 3, y. We should obtain an orthogonal system v 1, v 2, v 3, v 4. Then t he desired distance will be v 4. v 1 = x 1 = (1, 1, 1, 1) v 2 = x 2 <x 2,v 1 > <v 1,v 1 > v 1 = (1, 1, 3, 1) 4 4 (1, 1, 1, 1) = ( 1/2, 1/2, 1) v 3 = x 3 <x 3,v 1 > <v 1,v 1 > v 1 <x 3,v 2 > <v 2,v 2 > v 2 = ( 3, 7, 1, 3) 12 4 (1, 1, 1, 1) 16 8 (0, 2, 2, 0) = (0, 0, 0, 0)
. The Gram-Schmidt process can be used to check linear independence of vectors!. It failed because the vector x 3 is a linear combination of x 1 and x 2. V is a plane, not a 3 dimensional subspace. To fix things, it is enough to drop x 3, i.e., we should orthogonalize vectors x 1, x 2, y. ˆv 3 = y <y,v 1> <v 1,v 1 > v 1 <y,v 2> <v 2,v 2 > v 2 = (0, 0, 0, 1) 1 4 (1, 1, 1, 3) 0 8 (0, 2, 2, 0) = (1/4, 1/4, 1/4, 3/4) Then the distance from y to V equals to ˆv 3 = (1/4, 1/4, 1/4, 3/4) = 12 4
Norm The notion of norm generalizes the notion of length of a vector in R n. Definition. Let V be a vector space. A function α : V R n is called a norm on V if it has the following properties: α(x) 0, α(x) = 0 only for x = 0 (Positivy) α(rx) = r α(x) for all r R. (Homogeneity)
α(x + y) α(x) + α(y). (Triangle Inequality) Notation. The norm of a vector x V is usually denoted by x. Different norms on V are distinguished by subscripts, e.g., x 1 and x 2.
Let V = R n and let x = (x 1, x 2,..., x n ) be a vector in V x = max{ x 1, x 2,..., x n } Positivity and homogeneity are obvious. Let x = (x 1, x 2,..., x n ) and y = (y 1, y 2,..., y n ). Then x + y = (x 1 + y 1, x 2 + y 2,..., x n + y n ) x i + y i x i + y i max j x j + max j y j max j x i + y i max j x j + max j y j x + y x + y
x 1 = x 1 + x 2 +... + x n Positivity and homogeneity are obvious. Let x = (x 1, x 2,..., x n ) and y = (y 1, y 2,..., y n ). Then x + y = (x 1 + y 1, x 2 + y 2,..., x n + y n ) x i + y i x i + y i i x i + y i i x i + i y i x + y 1 x 1 + y 1
x p = ( x 1 p + x 2 p +... + x n p ) 1/p Positivity and homogeneity are obvious. Let x = (x 1, x 2,..., x n ) and y = (y 1, y 2,..., y n ). Then x + y = (x 1 + y 1, x 2 + y 2,..., x n + y n ). Now, using the Minkoswky Inequality ( x 1 + y 1 p + x 2 + y 2 p + + x n + y n p ( x 1 p + x 2 p + + x n p ) 1/p + ( y 1 p + y 2 p + + y n p ) 1/p with p > 0 it follows that x + y p x p + y p
Normed vector space Definition. A normed vector space is a vector space endowed with a norm. The norm defines a distance function on the normed vector space: dist(x, y) = y y. Then we say that a vector x is a good approximation of a vector x 0 if dist(x, x 0 ) is small. Also, we say that a sequence of vectors x 1, x 2,..., x n converges to a vector x if dist(x, x n ) 0 as n.
Unit circle on the normed vector space V = R n : {x V : x p = 1} x 1 = ( x 1 + x 2 x 3/2 = ( x 1 3/2 + x 2 3/2 ) 3/2 x 2 = ( x 1 2 + x 2 2 ) 1/2 x 3 = ( x 1 3 + x 2 3 ) 1/3 x 6 = ( x 1 6 + x 2 6 ) 1/6 x = max{ x 1 + x 2 }
V = C[a, b], f : [a, b] R f = max a x b f (x) f 1 = inta b f (x) dx f p = ( inta b f (x) p dx ) 1/p, p 1 Theorem f p is a norm on C[a, b] for any p 1
Abstract Linear Algebra The notion of inner product generalizes the notion of dot product of vectors in R n Definition Let V be a vector space. A function β : V V R usually denoted β(x, y) =< x, y > is called an inner product on V if it is positive, symmetric, and bilinear. That is, if i) < x, x > 0, < x, x >= 0 only for x = 0 (Positivity) ii) < x, y >=< y, x > (Symmetry)
Abstract Linear Algebra iii) < rx, y >= r < y, x > (Homogeneity ) iv) < x + y, z >=< x, z > + < y, z > (Distributive Law ) An inner product space is a vector space endowed with an inner product.
Abstract Linear Algebra V = R n Remarks < x, y >= x y = x 1 y 1 + x 2 y 2 + + x n y n < x, y >= d 1 x 1 y 1 + d 2 x 2 y 2 + + d n x n y n where d 1, d 2,..., d n > 0 < x, y >= (Dx) (Dy) where D is an invertible n n matrix. a) Invertibility of D is necessary to show that < x, x >= 0 x = 0 b) The second example is a particular case of the third one when D = diag(d 1/2 1, d 1/2 2,..., dn 1/2 ).
Abstract Linear Algebra Example 12.6 Find an inner product on R 2 such that < e 1, e 1 >= 1, < e 2, e 1 >= 3, and < e 1, e 2 >= 1 where e 1 = (1, 0), e 2 = (0, 1) Solution Let x = (x 1, x 2 ), y = (y 1, y 2 ) R 2. Then using bilinearity, we obtain < x, y >=< x 1 e 1 + x 2 e 2, y 1 e 1 + y 2 e 2 >= x 1 y 1 < e 1, e 1 > +x 1 y 2 < e 1, e 2 > +x 2 y 1 < e 2, e 1 > +y 1 y 1 < e 2, e 2 >=
Abstract Linear Algebra < x, y >= 2x 1 y 1 x 1 y 2 x 2 y 1 + 3x 2 y 2 It remains to check that < x, x > > 0 for x 0 Indeed, < x, x >= 2x 2 1 y 1 2x 1 x 2 + 3x 2 2 = (x 1 x 2 ) 2 + x 2 1 + 2x 2 2 > 0 for x 0
Abstract Linear Algebra V = M m,n (R), space of m n matrices. < A, B >= trace(ab T ) If A = (a ij ) and B = (b ij ), then < A, B >= m i=1 n j=1 a ijb ij V = C[a, b]. < f, g >= b a f (x)g(x)dx < f, g >= b a w(x)f (x)g(x)dx where w is bounded, piecewise continuous, and w > 0 everywhere on [a, b]. w is called the weight function
Abstract Linear Algebra Theorem Suppose < x, y > is an inner product on a vector space V. Then for all x, y V Proof < x, y > 2 < x, x >< y, y > For any t R let v t = x + ty then < v t, v t >=< x + ty, x + ty >=< x, x > +t < x, y > + t < y, x > +t 2 < y, y >
Abstract Linear Algebra Now, assume that y 0 and let t = <x,y> <y,y>. Then < v t, v t >=< x, x > <x,y>2 <y,y> since < v t, v t > 0 the desired inequality follows. In the case y = 0, we have < x, y >=< y, y >= 0
Abstract Linear Algebra Cauchy-Schwarz Inequality < x, y > < x, x > < y, y > Corollary < x, y > x y Corollary For any f, g C[a, b], ( b a f (x)g(x)dx ) 2 b a f (x) 2 dx b a g(x) 2 dx
Abstract Linear Algebra Norms induced by inner products Theorem Suppose < x, y > is an inner product on a vector space V. Then, x = < x, x > is a norm. Proof Positivity is obvious. Homogeneity: rx = < rx, rx > = r < x, x > = r x Triangle inequality (follows from Cauchy-Schwarzs): x + y 2 =< x + y, x + y >=< x, x > + < x, y > + < y, x > + < y, y >
Abstract Linear Algebra < x, x > + < x, y > + < y, x > + < y, y > x 2 + 2 x y + y 2 = ( x + y ) 2
Abstract Linear Algebra The length of a vector in R n. x = x1 2 + x 2 2 + + x n 2 ( b 1/2 The norm f 2 = a dx) f (x) 2 = on the vector space C[a, b] is induced by the inner product < f, g >= b a f (x)g(x)dx
Abstract Linear Algebra Angle Let V be an inner product space with an inner product <, > and the induced norm. Then < x, y > x y for all x, y V (the Cauchy-Schwarz inequality). Therefore we can define the angle between nonzero vectors in V by ) (x, y) = arcos ( <x,y> x y Then < x, y >= x y (x, y). In particular, vectors x and y are orthogonal (denoted x y if < x, y >= 0.
Abstract Linear Algebra Orthogonal sets Let V be an inner product space with an inner product <, > and the induced norm. Definition A nonempty set S V of nonzero vectors is called an orthogonal set if all vectors in S are mutually orthogonal. That is, 0 / S and < x, y >= 0 for any x, y S, x y. An orthogonal set S V is called orthonormal if x = 1 for any x S.
Abstract Linear Algebra Remark. Vectors v 1, v 2,..., v n V form an orthonormal set if and only if { 1, if i = j, < v i, v j >= 0, if i j,
. Singular Value Decomposition In this section, we assume throughout that A is an m n matrix with m n. (This assumption is made for convenience only; all the results will also hold if m < n). We will present a method for determining how close A is to a matrix of smaller rank. The method involves factoring A into a product UΣV T, where U is an m m orthogonal matrix, V is an n n orthogonal matrix, and Σ is an m n matrix whose off-diagonal entries are all 0 s and whose diagonal elements satisfy σ 1 σ 2 σ n 0
. Σ = σ 1 σ 2... σ n The σ s determined by this factorization are unique and are called the singular values of A. The factorization UΣV T is called the singular value decomposition of A, or, for short, the SVD of A.
. The SVD Theorem If A is an m n matrix, then A has a singular value decomposition Sketch of the proof A T A is a symmetric n n matrix. The eigenvalues of A T A are all real and it has an orthogonal diagonalizing matrix V. Furthermore, its eigenvalues must all be nonnegative. To see this point, let λ be an eigenvalue of A T A and x be an eigenvector belonging to λ. It follows that
. Ax 2 = x T A T Ax = x T λx = λx T x = λ x 2 λ = Ax 2 x 2 We may assume that the columns of V have been ordered so that the corresponding eigenvalues satisfy λ 1 λ 2 λ n 0. The singular values are given by σ j = λ j, j = 1, 2,..., n
. Let r denote the rank of A. The matrix A T A will also have rank r. Since A T A is symmetric, its rank equals the number of nonzero eigenvalues. Thus, σ 1 σ 2 σ r > 0 σ r+1 = σ r+2 = = σ n = 0 Now let V 1 = (v 1, v 2,..., v r, ) and V 2 = (v r+1, v r+2,..., v n, ) The column vectors of V 1 are eigenvectors of A T A belonging to λ i, i = 1, 2,..., r. The column vectors of V 2 are eigenvectors of A T A belonging to λ j = 0, j = r + 1, r + 2,..., n.
. Now let Σ 1 be the r r matrix defined by Σ 1 = σ 1 σ 2... σ n The m n matrix Σ is then given by ( ) Σ1 0 Σ = 0 0
. To complete the proof, we must show how to construct an m m orthogonal matrix U such that A = UΣV T AV = UΣ Comparing the first r columns of each side of the last equation, we see that Thus, if we define Av i = σ i v i, u i = 1 σ i Av i, i = 1, 2,..., r i = 1, 2,..., r
. and then it follows that U 1 = (u 1, u 2,..., u r ) AV 1 = U 1 Σ 1 The column vectors of U 1 form an orthonormal set. Thus, form an orthonormal basis for R(A). The vector space R(A) = N(A T ) has dimension m r. Let {u r+1, u r+2,, u n } be an orthonormal basis for N(A T ) and set U 2 = (u r+1, u r+2,..., u n )
. Setthe m n matrix U by U = (U 1, U 2 ) The the matrices U, Σ, and V satisfy A = UΣV T
. Observations Let A be an m n matrix with a singular value decomposition A = UΣV T. The singular values σ 1,..., σ n of A are unique; however, the matrices U and V are not unique. Since V diagonalizes A T A, it follows that the v j s are eigenvectors of A T A. Since AA T = UΣΣ T U T, it follows that U diagonalizes AA T and that the u j s are eigenvectors of AAT. The v j s are called the right singular vectors of A, and the u j s are called the left singular vectors of A.
. If A has rank r, then (i) v 1, v 2,..., v r form an orthonormal basis for R(A T ). (ii) v r+1, v r+2,..., v n form an orthonormal basis for N(A). (iii) u 1, u 2,..., u r form an orthonormal basis for R(A). (iv) u r+1, u r+2,..., u r+n form an orthonormal basis for N(A T ) The rank of the matrix A is equal to the number of its nonzero singular values (where singular values are counted according to multiplicity).
. In the case that A has rank r < n, if we set V 1 = (v 1, v 2,..., v r ) U 1 = (u 1, u 2,..., u r ) and define Σ 1 as before, then A = U 1 Σ 1 V T 1 This factorization, is called the compact form of the singular value decomposition of A. This form is useful in many applications
. Example 12.7 let A = 1 1 1 1 0 0 Compute the singular values and the singular value decomposition of A Solution The matrix AA T = ( 2 2 2 2 )
. has eigenvalues λ 1 = 4 and λ 2 = 0. Consequently, the singular values of A are σ 1 = 2 and σ 2 = 0 The eigenvalue λ 1 has eigenvectors of the form α(1, 1) T, and σ 2 has eigenvectors of the form β(1, 1) T. Therefore, the orthogonal matrix V = 1 2 ( 1 1 1 1 )
. diagonalizes A T A. From what we discussed before, it follows that u 1 = 1 Av 1 = 1 σ 1 2 1 1 1 1 0 0 ( 1/ 2 1/ 2 ) = 1/ 2 1/ 2 0 The remaining column vectors of U must form an orthonormal basis for N(A T ). We can compute a basis {x 2, x 3 } for N(A T ) in the usual way. x 2 = (1, 1, 0) T, x 3 = (0, 0, 1) T Since these vectors are already orthogonal, it is not necessary to use the Gram-Schmidt process to obtain an orthonormal basis.
. We need only set u 2 = 1 x 2 x 2 = ( 1 2, 1 2, 0) T, u 3 = 1 x 3 x 3 = (0, 0, 1) T It then follows that A = UΣV T = 1 2 1 2 0 1 2 1 2 0 0 0 1 2 0 2 0 0 0 ( 1 2 1 2 1 2 1 2 )
. OBS If A has singular value decomposition UΣV T, then A can be represented by the outer product expansion A = σ 1 u 1 v T 1 + σ 2 u 2 v T 2 + + σ n u n v T n The closest matrix of rank k, is obtained by truncating this sum, after the first k terms: A = σ 1 u 1 v T 1 + σ 1 u 2 v T 2 + + σ n u k v T k, k < n