Class notes: Approximation Introduction Vector spaces, linear independence, subspace The goal of Numerical Analysis is to compute approximations We want to approximate eg numbers in R or C vectors in R m or C n functions on an interval: [a,b] R or [a,b] C Therefore we are trying to approximate an element v of a vector space V In a vector space the addition v + w of vectors is defined, and the product scalar times vector αv Here α is an element of the field of scalars, in our case either in R real vector space) or C complex vector space) For vectors u ),,u n) and scalars c,,c Nn the linear combination v = c u ) + + c n u n) is again a vector If c u ) + + c n u n) can only be the zero vector for c = = c n = we say that the vectors u ),,u n) are linearly independent The span of vectors u ),,u n) is the set of all linear combinations by { span u ),,u n)} { } = c u ) + + c n u n) c,,c n C A subset W V of a vector space V which is also a vector space is called a subspace Example: For V = R 3 we have the following subspaces: the whole space R 3 planes through the origin lines through the origin { } If a subspace W is the span of linearly independent vectors u ),,u n) we say that u ),,u n) is a basis of the subspace W the dimension of W is n Note that V = R 3 has dimension 3, since,, forms a basis Note that V = C[,]) is not finite dimensional: There do not exist finitely many vectors u ),,u N) which span the whole space V
Norms Assume we want to find v V, and we are able to find the approximation w W We then want to measure the size of the error For this we use a norm, and the size of the error is w v The norm is a mapping V [, ) with the following properties: v = implies that v = null vector) I will usually not use arrows to denote vectors, but I will use to emphasize that this is in V and not the scalar ) αv = α v for all scalars α v + w v + w triangle inequality) Examples for norms: For a vector v = v v n v = max j=,,m v j in R m or C m : v = v + + v m v = v + + v m ) / For a continuous function f defined on an interval [a,b]: f = max x [a,b] f x) f = b a f x) dx f = b a f x) dx ) / A sequence v ),v ),v 3), of elements of V forms a Cauchy sequence if ε > N N such that m,n N implies v m) v n) < ε Note that the vector spaces R n and C n equipped with any of these norms are complete: Any Cauchy sequence v ),v ),v 3), has a limit element v C such that v k) converges to v as k, ie, v k) v as k The vector space C[a,b]) of continuous functions on [a,b] equipped with is complete: For a Cauchy sequence of continuous functions there exists a limit v which is also a continuous function We have v k) v, this is called uniform convergence The vector space C[a,b]) of continuous functions on [a,b] equipped with or is NOT complete: Consider eg the functions v k) on [,] defined by for x [, v k) k ) x) = kx for x [ k, k ] for x k,] { for x < which have the limit function v = for x > where we can define any value for x = : We then have v k) v and v k) v as k If we have v k) v as k, ie, b a v k) x) vx) dx as k we say that v k) converges to v in the mean square sense Note that v k) converges to v in the mean square sense, but we do not have uniform convergence Note that two functions v,ṽ which only differ in the function value at x = satisfy v ṽ = and v ṽ =
The set of possible limits with respect to is called L [a,b]) These are functions v such that v is Lebesgue integrable on [a,b] The set of possible limits with respect to is called L [a,b]) These are functions v such that v is Lebesgue integrable on [a,b] Note that functions v and ṽ which differ only on a set of measure are considered identical in L and L This means that L [a,b]) and L [a,b]) are quotient spaces: its elements are not functions, but rather sets of functions which differ only on a set of measure zero Example: The function vx) = x / is in L [,]), but not in L [,]) The space L [a,b]) with is complete The space L [a,b]) with is complete 3 Inner product spaces For the vector space C n we introduce the inner product u,v) := Here v i denotes the complex conjugate: For a complex number z = x + iy we have z = x iy We can the define the norm of a vector as u := u,u) / ) For a vector u C n this means that u = u + + u n = v We say that two vectors u,v are orthogonal if u,v) = An important property is the Cauchy-Schwarz inequality: u,v) u v For the vector spaces C[a,b]) and L [a,b]) we can also define an inner product: For functions u and v we define n i= u i v i and have b u,v) := ux)vx) dx a b / u = u,u) / = ux) dx) = u a Least squares approximation We want to approximate an element v of a vector space V We want to find an approximation w in the finite dimensional subspace W = span { a ),,a n)} We measure the error w v with a norm, and we want to find the best approximation Example problem: We want to understand how a calculator or computer can evaluate sinx for a given value x The processor can essentially only perform addition, multiplication, division Therefore we can try to approximate the function with a polynomial We use the notation P n for polynomials of degree k: { } P k = c + c x + + c k x k c,,c k R It is sufficient to consider x [,π/] Instead of approximating sinx for x [,π/] we can consider the function ux) = sin π x) for x [,] We want to find a polynomial wx) P n such that the error is minimal 3
Problem : Find the best approximation with respect to u w : Find a polynomial w P n such that the maximal error u w = max x [,π/] ux) wx) is minimal Problem : Find the best approximation in the least squares sense: Find a polynomial w P n such that u w = ux) wx) dx is minimal It turns out that Problem is much easier to solve than Problem Abstract version of the least squares problem Let V denote a vector space which is equipped with an inner product, ) This defines a norm v := v,v) / We are given: a vector u V and an n-dimensional subspace W := span {a ),,a n)} where the vectors a ),,a n) are assumed to be linearly independent otherwise some of them are redundant and can be dropped) Least Squares Problem: Find a vector w = c a ) + + c n a n) such that u w is minimal 3 Algorithm : Normal Equations We are given a point u outside of the subspace W We want to find the closest point w in this subspace Geometric intuition tells us that w u should be orthogonal on the subspace W This orthogonality condition is called Normal Equations: Find w = c a ) + + c n a n) such that w u,a j)) = for j =,,n This gives the linear system of the following n equations for n unknowns c,,c n : for j =,,n : We can write this in matrix-vector notation with c = c c n n a k),a j)) c k = u,a j)) k= C n as Mc = b Gram matrix M C n n has entries M jk = a k),a j)) right-hand side vector b C n has entries b j = u,a j)) After we have computed the entries of M and b we have to solve the n n linear system Mc = b We can use Gaussian elimination with pivoting for this In Matlab we can use c=m\b The linear system has a unique solution if and only if the matrix M is nonsingular Proposition M is nonsingular 4
Proof We have to show that Mv = implies v = We claim that the unique solution c of the normal equations is the solution of the least squares problem Proposition The solution vector c of the normal equations yields the unique solution of the least squares problem Proof Let w = c a ) + + c n a n) We want to show that any w W different from w will have w u > w u We can write w = w + v with some v W, then multiplying out and using the normal equations we obtain u w = u w v,u w v) = u w,u w) v,u w) u w,v) +v,v) }{{}}{{} u w = u w + v which is actually Pythagoras theorem for the triangle with vertices u,w, w Hence u w u w and we have equality only if v = d a ) + +d n a n) = Since a ),,a n) are by assumption linearly independent we must have d = = d n = Hence the coefficients c,,c n of the least squares solution are unique 3 Case of V = C m We now consider the special case V = C m with the inner product u,v) = u v + + u n v n = [v,,v n ] where we use the notation v H := v Note that in Matlab v H is obtained by v We need to have m n, otherwise the vectors a ),,a n) C m are linearly dependent We define the matrix A C m n using the columns a ),,a n) A = [a ),,a n)] u u n Then we can write w = c a ) + + c n a n) = Ac Least Squares Problem: Given A C m n and u C n find a vector c C n such that Ac u is minimal Note that we can write the Gram matrix M and the right-hand side vector b as M = A H A, b = A H u Hence we can write the normal equations as A H Ac = A H u 3 Notation A = [a ),,a n)] We can use the matrix-vector notation also in the general case of a vector space V, eg for functions in V = L [,]) For n vectors a ),,a n) V we denote this n-tupel by A = [ a ),,a n)] We can then write a linear combination with coefficient vector c C n with the matrix-vector notation c a ) + + c n a n) = [a ),,a n)] c c n = Ac 5
Note that we have for vectors v = Ac and ṽ = A c with c, c C n that with the Gram matrix M v,ṽ) = Ac,A c) = c H Mc = Mc, c) Example: Consider the vector space V = C[,]) For v = sin π x) find the best least squares approximation using a linear combination of a ) =, a ) = x Using the above notation we find the solution as follows: A := [,x] M := b := A Adx = [ x x x A sin π x)dx = Solving the linear system Mc = b gives c = π [ 8π 4 48 π ] [ dx = [ sin π x) x sin π x) 3 ] ] dx = [ π 4 π ] and obtain the solution ] w = Ac = c a ) + c a ) = π [8π 4) + 48 π)x] We can do this with symbolic operations in Matlab as follows: >> syms x; Pi = sym pi ); >> v = sinpi/*x); A = x^:) A = [, x] >> M = inta *A,x,,) M = [, /] [ /, /3] >> b = inta *v,x,,) b = /pi 4/pi^ >> c = M\b c = 8*pi - 3))/pi^ -*pi - 4))/pi^ >> w = A*c; >> A = x^:); M = inta *A,x,,); b = inta *v,x,,); c = M\b; w = A*c; >> fplot[v,w,w],[,]); legend sin\pi x/), approx with,x, approx with,x,x^ ) 6
sin x/) approx with,x approx with,x, x 8 6 4 4 6 8 4 Algorithm : Orthogonalization We can use a new basis p ),, p n) for W = span { a ),,a n)} Then we can write w W as w = d p ) + + d n p n) and we obtain the normal equations p ), p )) p n), p )) d u, p ) ) p ), p n)) p n), p n)) = d n u, p n) ) If the vectors p ),, p n) are orthogonal on each other, the matrix is diagonal and the linear system is very easy to solve 4 Gram-Schmidt orthogonalization and decomposition A = PS Assume the vectors a ),,a n) V are linearly independent We want to find vectors p ),, p n) which are orthogonal on each other and satisfy { span p ),, p k)} { = span a ),,a k)} for k =,,n We define p ) := a ) p ) := a ) s p ) p 3) := a 3) s 3 p ) s 3 p ) 7
where we choose the coefficients k j such that we have p j), p k)) = for j =,,n, k =,, j Eg, the condition p ), p )) gives a ), p )) s = p ), p )) The conditions p 3), p )) = and p 3), p )) = give a 3), p )) a 3) s 3 = p ), p )), s, p )) 3 = p ), p )) Therefore we obtain Gram-Schmidt orthogonalization algorithm: For j =,,n: p j) := a j) j a s k j p k) j), p k)) where s k j := p k), p k)) k= We see that p j) span { a ),,a j)} for j =,,n All the vectors p j) are nonzero: Eg, if obtained p 3) = this would mean that a 3) span { p ), p )} span { a ),a )}, hence the vectors a ),a ),a 3) are linearly independent, in contradiction to our assumption that a ),,a n) are linearly independent Note that we have ie, we have the decomposition [ a ),,a n)] = a ) := p ) a ) := p ) + s p ) a 3) := p 3) + s 3 p ) + s 3 p ) A = PS [p ),, p n)] s s n sn,n with the upper triangular matrix S C n n This is the QR decomposition without normalization We can write w W as Because of A = PS the vectors c,d C n are related by Sc = d w = Ac = Pd This gives the following algorithm for solving the least squares problem: Given a ),,a n) use the Gram-Schmidt algorithm to find p ),, p n) and the matrix S Given u V compute the coefficients d,,d n from Section 4: u, p j) ) For j =,,n : d j := p j), p j)) 3 We obtain w W with w u = min w = d p ) + + d n p n) We obtain the coefficient vector c with Ac u = min by solving the linear system Sc = d using back substitution 8
4 Orthonormalization and decomposition A = QR Sometimes it is useful to have an orthonormal basis ON basis) q ),,q n) for span { a ),,a n)} We can obtain this by normalizing the vectors p ),, p n) If we multiply row j of the matrix S by p j) q j) = p j) p j), [r j j,,r jn ] := p j) [,sj, j+,,s jn ] we get from A = PS the so-called QR decomposition [ a ),,a n)] = A = QR [q ),,q n)] where the matrix R is upper triangular with positive entries on the diagonal r r n r nn 43 Case of V = C m We are given n linearly independent vectors a ),,a n) C m Therefore we must have m n This corresponds to the matrix A = [ a ),,a n)] C m n with linearly independent columns We then obtain the QR decomposition [ a ),,a n)] = [q ),,q n)] }{{} orthonormal = Q R A m n m n n n r r n r nn We can compute this in Matlab by [Q,R]=qrA,) This is known as the compact version of the QR decomposition Note that Matlab uses a slightly different algorithm and therefore some of the columns of Q are multiplied by, and the corresponding r j j are negative The vectors q ),,q n) form an orthonormal basis for rangea This is a subspace of dimension n of the vector space V which has dimension m n We can extend the orthonormal basis q ),,q n) to a basis q ),,q m) of the whole space C m Here the additional vectors q n+),,q m) form an orthonormal basis for the orthogonal complement of rangea This yields the full version of the QR decomposition: [ a ),,a n)] = [q ),,q n),q n+),,q m)] }{{} orthonormal A = Q m n m m m n We can compute this in Matlab by [Qt,Rt]=qrA) R r r n r nn Note that the additional part [ q n+),,q m)] does not contribute anything to the decomposition since it is multiplied by zero In many applications eg solving a least squares problem) the economy size version [Q,R]=qrA,) is sufficient Only use the full version [Qt,Rt]=qrA) if you need a basis for the orthogonal complement of rangea The most accurate way to compute the QR decomposition in machine arithmetic is to transform the columns of A step by step to an upper triangular matrix R using so-called Householder reflections This is the algorithm used by Matlab for the qr command 9
5 Polynomials Let P k denote polynomials of degree k: P k = { } c + c x + + c k x k c,,c k C The space P n has dimension n A basis is given by the monomials,x,,x n However, this basis may cause problems in computations with machine numbers for higher degree polynomials Eg, the Gram matrix for [, ] is the infamous Hilbert matrix H with entries H i j = i+ j where the condition number grows exponentially fast with n: >> for n=5:; disp[n,condhilbn))]); end 5 4766e+5 6 495e+7 7 47537e+8 8 558e+ 9 4935e+ 65e+3 5e+4 677e+6 5 Space L w with weight function wx) on a,b) We consider functions vx) defined on the interval a,b) We define the weighted L norm b v = vx) wx)dx a with a weight function wx) We want wx) > except for finitely many points in a,b), and p < for any polynomial p We have the inner product b u,v) := ux)vx)wx)dx a and v = v,v) / We denote the space of all functions v with v < by L w where integrals are understood in the Lebesgue sense, and functions which only differ on sets of measure zero are understood to be the same vector) Examples are,) with wx) =,) with wx) = x ) /, ) with wx) = e x We consider the subspace P n = span{a ),,a n) } where a ) =,,a n) = x n We can use the Gram-Schmidt algorithm to construct an orthogonal basis p ),, p n) Note that these polynomials have leading coefficient, ie, p j) = x j + lower order terms Let us denote these orthogonal polynomials by p x), p x), p x), They are defined by p x) = and x j ), p k and satisfy for j,k =,,, p j x) := x j j k= p k, p k ) p k for j =,,3, p j x) = x j + lower order terms, p j, p k ) = for j k Here we start with the function x j and subtract a linear combination of p j, p j,, p to achieve p j,x k) = for k = j, j,,
Instead we can also start with the function xp j x) = x j + lower order terms: p j x) := xp j j k= xp j, p k ) p k for j =,,3, p k, p k ) This also yields a polynomial p j x) = x j + lower order terms which is orthogonal on P j Hence it must be the same polynomial if we have two polynomials p j and p j we have p j p j P j and p j p j is orthogonal on P j ) Then we we have xp j, p k ) = p j,xp k ) = for k < j, hence we obtain the three term recursion p j x) := xp j xp j, p j ) p j, p j ) p j xp j, p j ) p j, p j ) p j We have xp j, p j ) = p j,xp j ) = p j, p j ) since xp j p j P j, hence p = p = x δ p j = x δ j ) p j γ j p j with δ j := xp j, p j ) p j, p j ), γ j := p j, p j ) p j, p j ) Here are the most popular orthogonal polynomials: Name a,b) wx) δ j+ γ j+ LEGENDRE, ) { 4 j CHEBYSHEV,) x ) / for j = 4 for j JACOBI,) + x) α x) β β α 4 j j + α) j + β) j + α + β) j + α + β) + j + α + β) j + α + β) 4 j + α + β) JACOBI, α = β,) x ) α j j + α) 4 j + α) LAGUERRE, ) e x j + j HERMITE, ) e x j/ For the Jacobi polynomials we need α,β > The orthogonal polynomial p n P n has oscillating behavior, see, eg, the graph of the Legendre polynomial p x) on the next page We have Theorem The orthogonal polynomial p n x) changes sign at n distinct points t,,t n a,b) Proof Let t,,t m be the distinct points in a,b) where p n x) changes sign Assume m < n Then the function qx) := p n x) x t ) x t m ) does not change its sign We have, eg, qx)wx) > for all x a,b) with finitely many exceptions and hence b < p n x t ) x t m ) wx)dx = p n,x t ) x t m )) L a w But this inner product must be zero since p n is orthogonal on P m This is a contradiction We could first use the recursion formula to find p k x) in the form c + c x + + c n x n Then we can use this monomial form to eg evaluate p k x) for given values x, or to find the zeros of p k This is usually a bad idea since we are using the monomial basis which can cause problems with numerical stability We can often avoid using the monomial form of p n, and use the recursion coefficients δ,δ,δ 3, and γ,γ 3, directly
Theorem The zeros of the orthogonal polynomial p n are the eigenvalues of the symmetric tridiagonal matrix δ γ γ δ γn Proof We normalize the orthogonal polynomials such that p j = : p := p, p := p γ, p := p γ γ 3,, p n := Then the recursion formula becomes with p :=, p := ) γ n δ n p n γ γ 3 γ n+ γ j+ p j = x δ j ) p j γ j p j for j =,,n or or in matrix-vector notation δ γ γ j p j + δ j p j + γ j+ p j = x p j γ δ γn γ n δ n p x) p n x) + γ n+ p n x) for j =,,n = x p x) p n x) Example: Plot the Legendre polynomial p and find its zeros: n = ; A = sparsen,n); x = -::; pold = x^; p = x; for j=:n g = /4-j-)^-); g = g^5; Aj-,j) = g; Aj,j-) = g; pnew = x*p - g*pold; pold = p; p = pnew; end plotx,p); grid on; hold on z = eiga); plotz,zerosn,), ro ); hold off % initialize matrix A as sparse array % evaluate p for these x-values % initialize p_, p_ % gamma_j % build tridiagonal matrix A % find p_j from p_{j-}, p_{j-} % zeros of p are eigenvalues of A 6-3 4 - -4 - -8-6 -4-4 6 8 Is it a good idea to solve an eigenvalue problem in order to compute the zeros of a polynomial? Yes!
Finding the roots of a polynomial px) = c + c x + + c n x n from given coefficients c,,c n can be a very ill-conditioned problem see the famous Wilkinson polynomial px) = x ) x ) ) In contrast, finding the eigenvalues of a symmetric tridiagonal n n matrix is a very well-behaved numerical problem There are several efficient algorithms available QR algorithm with shift used by Matlab, divide-and-conquer ) which need typically On ) operations to compute the eigenvalues λ,,λ n For faster On logn) algorithms see eg [Coakley, Rokhlin 3]) 6 Review of Hermitian matrices and matrix norms 6 Review: Hermitian matrices, positive definite matrices We call a matrix A C n n Hermitian if A H = A For a real matrix this is the same as symmetric Theorem 3 Assume A C n n is Hermitian Then the eigenvalues λ,,λ n are real, and there exists an orthonormal basis of eigenvectors v ),,v n) : With V = [ v ),,v n)] we have A = V λ λ n V H, V H V = I Let A C n n be Hermitian and u C n Then q := u H Au R since q H = u H A H u H ) H = u H Au We have with v := V H u that v = u and u H Au = u H V λ λ V H u = v H v = λ v + + λ n v n λ n λ n The eigenvalues λ,,λ n are real Let λ min := min j=,n λ j and λ max := max j=,n λ j Hence we obtain lower and upper bounds for u H Au: λ min v λ v + + λ n v n λ max v λ min u uh Au λ max u We call a Hermitian matrix A positive definite if for all nonzero vectors u u H Au > Note that a Hermitian matrix is positive definite if and only if all eigenvalues are positive Consider linearly independent vectors a ),,a n) V Then the Gram matrix M with M jk = a k),a j)) is Hermitian since M k j = a j),a k)) = c a k),a j)) = M jk It is positive definite since for c n 6 Matrix norms: -norm, Frobenius norm c H Mc = c a ) + + c n a n) > A matrix A C m n corresponds to a linear mapping C n C m, c Ac 3
We would like to have a bound such that v C n : with the smallest possible constant C We can define a norm as the maximal magnification factor : Av C v A = sup Av = sup Av v C n v C n v = v Since the unit ball in C n is compact there exists v with v = such that Av is maximal, and we can write max instead of sup For the vector norms v, v, v we obtain in this way matrix norms A, A, A One can show that The last line follows from n j=,,n k= A = max n k=,,n j= A = max a jk a jk maximal row sum of absolute values maximal column sum of absolute values A = λ max A H A ) maximal eigenvalue of the Gram matrix M = A H A Ac = c a ) + + c n a n) = n j= n k= We can also define the Frobenius norm aka Hilbert-Schmidt norm) c j c k a j),a k)) = c H Mc λ max M) c a A F := ) ) / a + + n) = M + M + + M nn) / Note that Ac = n j= = n k= n j= c j c k a j),a k)) n n j= k= c j c k a j) a k) ) c j a j) c + + c ) a ) n ) a + + n) }{{} A F Hence we have A A F 63 Case A = Ac A F c [ a ),,a n)] with a j) V For a general inner product space V we can consider A = [ a ),,a n)] with a j) V For c C n we can consider the mapping c Ac = c a ) + + c n a n), C n V and define the norm A = sup Ac = sup Ac c C n c = c C n c Since the unit ball in C n is compact there exists v with v = such that Av is maximal, and we can write max instead of sup By the same argument as above we have A = λ max M) 4
with the Gram matrix M C n n With the Frobenius norm a A F := ) ) + + a n) / we have with the same arguments as above and A A F Ac A F c 5