PART II : Least-Squares Approximation

Similar documents
which arises when we compute the orthogonal projection of a vector y in a subspace with an orthogonal basis. Hence assume that P y = A ij = x j, x i

Least-Squares Systems and The QR factorization

Linear Algebra. Paul Yiu. Department of Mathematics Florida Atlantic University. Fall A: Inner products

ELE/MCE 503 Linear Algebra Facts Fall 2018

This can be accomplished by left matrix multiplication as follows: I

Lecture 4 Orthonormal vectors and QR factorization

The Conjugate Gradient Method

Orthonormal Bases; Gram-Schmidt Process; QR-Decomposition

MODULE 8 Topics: Null space, range, column space, row space and rank of a matrix

5. Orthogonal matrices

Linear Analysis Lecture 16

STAT 309: MATHEMATICAL COMPUTATIONS I FALL 2018 LECTURE 9

Applied Mathematics 205. Unit V: Eigenvalue Problems. Lecturer: Dr. David Knezevic

There are two things that are particularly nice about the first basis

LINEAR ALGEBRA QUESTION BANK

orthogonal relations between vectors and subspaces Then we study some applications in vector spaces and linear systems, including Orthonormal Basis,

Lecture 6: Geometry of OLS Estimation of Linear Regession

EE731 Lecture Notes: Matrix Computations for Signal Processing

Applied Linear Algebra

BASIC ALGORITHMS IN LINEAR ALGEBRA. Matrices and Applications of Gaussian Elimination. A 2 x. A T m x. A 1 x A T 1. A m x

(v, w) = arccos( < v, w >

A Brief Outline of Math 355

MTH 309Y 37. Inner product spaces. = a 1 b 1 + a 2 b a n b n

MATH 20F: LINEAR ALGEBRA LECTURE B00 (T. KEMP)

Linear Models Review

Part IB - Easter Term 2003 Numerical Analysis I

Linear Algebra. and

Review problems for MA 54, Fall 2004.

THE PROBLEMS FOR THE SECOND TEST FOR BRIEF SOLUTIONS

MATH 115A: SAMPLE FINAL SOLUTIONS

Index. book 2009/5/27 page 121. (Page numbers set in bold type indicate the definition of an entry.)

Interpolation and Approximation

Chapter 6 Inner product spaces

Linear Algebra Massoud Malek

The value of a problem is not so much coming up with the answer as in the ideas and attempted ideas it forces on the would be solver I.N.

Scientific Computing: An Introductory Survey

Applied Mathematics 205. Unit II: Numerical Linear Algebra. Lecturer: Dr. David Knezevic

Numerical Methods - Numerical Linear Algebra

CANONICAL FORMS FOR LINEAR TRANSFORMATIONS AND MATRICES. D. Katz

9. Numerical linear algebra background

Math 102, Winter Final Exam Review. Chapter 1. Matrices and Gaussian Elimination

INNER PRODUCT SPACE. Definition 1

The 'linear algebra way' of talking about "angle" and "similarity" between two vectors is called "inner product". We'll define this next.

May 9, 2014 MATH 408 MIDTERM EXAM OUTLINE. Sample Questions

Typical Problem: Compute.

Vectors in Function Spaces

Vectors. Vectors and the scalar multiplication and vector addition operations:

arxiv: v1 [math.na] 1 Apr 2015

Matrix decompositions

An Empirical Chaos Expansion Method for Uncertainty Quantification

Pseudoinverse and Adjoint Operators

Bindel, Fall 2016 Matrix Computations (CS 6210) Notes for

MAT Linear Algebra Collection of sample exams

Quantum Computing Lecture 2. Review of Linear Algebra

Math Camp Lecture 4: Linear Algebra. Xiao Yu Wang. Aug 2010 MIT. Xiao Yu Wang (MIT) Math Camp /10 1 / 88

Lecture 3: QR-Factorization

Chapter 3 Transformations

Preliminary/Qualifying Exam in Numerical Analysis (Math 502a) Spring 2012

Lecture 6. Numerical methods. Approximation of functions

6.241 Dynamic Systems and Control

Practical Linear Algebra: A Geometry Toolbox

Householder reflectors are matrices of the form. P = I 2ww T, where w is a unit vector (a vector of 2-norm unity)

The Gram Schmidt Process

The Gram Schmidt Process

Large-scale eigenvalue problems

AM 205: lecture 8. Last time: Cholesky factorization, QR factorization Today: how to compute the QR factorization, the Singular Value Decomposition

University of Colorado at Denver Mathematics Department Applied Linear Algebra Preliminary Exam With Solutions 16 January 2009, 10:00 am 2:00 pm

Math 407: Linear Optimization

Stability of the Gram-Schmidt process

Approximation Theory

Mathematical Methods wk 1: Vectors

Mathematical Methods wk 1: Vectors

96 CHAPTER 4. HILBERT SPACES. Spaces of square integrable functions. Take a Cauchy sequence f n in L 2 so that. f n f m 1 (b a) f n f m 2.

On Riesz-Fischer sequences and lower frame bounds

Foundations of Matrix Analysis

Linear Algebra 2 Spectral Notes

Linear Algebra. Session 12

MATH 1120 (LINEAR ALGEBRA 1), FINAL EXAM FALL 2011 SOLUTIONS TO PRACTICE VERSION

Linear Algebra (part 1) : Vector Spaces (by Evan Dummit, 2017, v. 1.07) 1.1 The Formal Denition of a Vector Space

Section 6.4. The Gram Schmidt Process

(v, w) = arccos( < v, w >

Math 307 Learning Goals

Finite-dimensional spaces. C n is the space of n-tuples x = (x 1,..., x n ) of complex numbers. It is a Hilbert space with the inner product

Solution of Linear Equations

W2 ) = dim(w 1 )+ dim(w 2 ) for any two finite dimensional subspaces W 1, W 2 of V.

Math Linear Algebra II. 1. Inner Products and Norms

Lecture 6, Sci. Comp. for DPhil Students

Inner products. Theorem (basic properties): Given vectors u, v, w in an inner product space V, and a scalar k, the following properties hold:

NATIONAL UNIVERSITY OF SINGAPORE DEPARTMENT OF MATHEMATICS SEMESTER 2 EXAMINATION, AY 2010/2011. Linear Algebra II. May 2011 Time allowed :

Polynomials. p n (x) = a n x n + a n 1 x n 1 + a 1 x + a 0, where

9. Numerical linear algebra background

(v, w) = arccos( < v, w >

Best approximation in the 2-norm

Practice Exam. 2x 1 + 4x 2 + 2x 3 = 4 x 1 + 2x 2 + 3x 3 = 1 2x 1 + 3x 2 + 4x 3 = 5

Math Real Analysis II

LINEAR ALGEBRA 1, 2012-I PARTIAL EXAM 3 SOLUTIONS TO PRACTICE PROBLEMS

FINITE-DIMENSIONAL LINEAR ALGEBRA

Numerical Methods. Elena loli Piccolomini. Civil Engeneering. piccolom. Metodi Numerici M p. 1/??

Computational Linear Algebra

Computation of eigenvalues and singular values Recall that your solutions to these questions will not be collected or evaluated.

Transcription:

PART II : Least-Squares Approximation Basic theory Let U be an inner product space. Let V be a subspace of U. For any g U, we look for a least-squares approximation of g in the subspace V min f V f g 2, where 2 is the 2-norm induced by the inner product. Denote by V the subspace of U that is orthogonal to V. By the problem statement we have the following knowledge of least-squares(ls) approximation. Decompose g into two orthogonal components g = g 1 + g 2, g 1 V, g 2 V. Then f = g 1 is the least-squares approximation in V to g in U, and the approximation error or residual is r def = g f = g 2. Proof. By the decomposition of g, we have the orthogonal decomposition of g f for any f V as well Thus, g f = (g 1 f) + g 2, g 1 f V, f 2 V. g f 2 2 = g 1 f 2 2 + g 2 2 2 g 2 2 2. The second equality is achieved at f = g 1 only. In other words the LS approximation is the orthogonal projection of g into the subspace V, and the residual is the orthogonal projection of g into the subspace V. We note by passing that zero 35

tolerance on approximation errors is only possible for functions already in V. LS solution in a subspace of finite dimension In computational practice we choose V a subspace of finite dimension. Let {f k k = 1 : n} be a basis of V. Then the LS approximation problem becomes problem of coefficient determination min a g n α k f k 2, a = [α 1,, α n ] t. k=1 We introduce a couple of approaches for the LS solution in a finite-dimension space. The first approach is traditional. By the above approximation properties, the LS solution satisfy the following equations which in matrix form is f i, g = n α k f i, f k, i = 1 : n. k=1 Ga = b, G = [ f i, f j ] i,j=1:n, b = [ f i, g ] i=1:n. The equations are known as the normal equations for the LS solution. The matrix is known as the Gram matrix associated with the inner product and the basis functions. Note that we have changed the minimization problem to the problem of solving the system of normal equations. The Gram matrix is determined by the chosen inner product and the basis functions, and hence may be computed once for all. In contrast, the right hand vector depends also on the function g to be approximated. It is often required that the right hand side be computed fast and accurately and that the system be solved fast and accurately. 36

On the solution of the normal equations, one shall notice that the Gram matrix is Hermitian and positive definite. Thus, the Cholesky factorization of the matrix exists, G = LL h, where L is lower triangular matrix. The Cholesky factorization transforms the system into two triangular systems, which can be solved easily by substitutions.the arithmetic cost for the Cholesky factorization is c n 3, where c is a small constant. In the special case that the basis functions are orthogonal, the Gram matrix is diagonal, and the solution a is easily obtained using at most n arithmetic operations. We now generalize the traditional approach in two ways, which can be combined. On the representing functions f k we often have in practice span{f k k = 1 : q, q n} = F. That is, the set {f k } is not necessarily linearly independent but contains a basis for F. 1. The Gram matrix G is not necessarily nonsingular. Nevertheless b span(g), i.e., the LS solution exists. 2. The LS solution is unique iff {f k } are linearly independent. In the case of multiple LS solutions, we often selects the solution with minimum 2-norm. In the other way, we let {h i, i = 1 : p, p n} be another set of functions in F containing a basis. Then, the LS solution satisfies the following equations as well h i, g = q k=1 α k h i, f k, i = 1 : p. 37

The matrix is not necessarily Hermitian or symmetric. It is not necessarily square. It is nonsingular when p = q = n, and it has a nonsingular submatrix of order n in general. We may find h i so that the matrix [ h i, f k ] is triangular and easy to solve for a. Specifically, when the matrix is upper triangular, h i is orthogonal to f k for k = 1 : i 1. We may also find h i that are bi-orthogonal with f k so that the matrix is diagonal. Approximation of continuous functions Consider the inner-product space U = C[a, b] with f, g = b a f(t)g(t)w(t)dt, where w(t) 0 is a weight function. LS Approximations in the following subspaces are straightforward, assuming the right hand side for the normal equations can be computed. Let [a, b] = [ 1, 1] and V = P n [ 1, 1]. 1. In the case w(t) = 1 the Legendre polynomials are often used as an orthogonal basis. 2. In the case w(t) = (1 x 2 ) 1/2 the Chebyshev polynomials are often used as an orthogonal basis. Let [a, b] = [ π, π] and V = T n [ π, π]. Let g U. Then g has the Fourier expansion g(t) = α 0 2 + k=1 α k cos(kx) + β k sin(kx). It is easy to verify that the LS approximation in T n is the truncated expansion at k = n. 38

Least-Squares Data Modeling In interpolation we ask a fitting curve to pass through all the given points (x i, y i ). Such a request is necessary for function evaluation at non-nodal points, in which case the interpolation nodes are well designed and the interpolation at non-nodal points often employs local references only. The interpolation condition, however, may not be necessary or appropriate for data modeling for a couple of reasons. First, for a large set of data, an interpolating function may involve too many basis functions to be computationally efficient in evaluations. It means in the case of polynomial interpolation that the degree of the interpolating polynomial may be very high. Second, interpolations may vary significantly with data distributions. Third, interpolations are sensitive to errors in the data. Consequently interpolation techniques may fail characterizing (modeling) the relationship between x and y. One simple approach to alleviating the problems is to relax the interpolation condition. Instead of asking the fitting curve to pass through the given data points, we seek a fitting curve that is the closest, in some sense, to the data among a family of candidate curves. Specifically, let f(x) be a member in the family F of candidate fitting functions. We allow fitting errors at the data e i = f(x i ) y i, i = 1 : N, and find the best fitting function that minimize the errors according to the least-square metric min e 2, f F where the 2-norm is induced from an inner product. This is a special case of LS approximation in discrete vector spaces. Recall that the fitting errors at the nodal points are required to be zero in 39

interpolation. The LS solution is not as sensitive to the errors in the data, and the weight associated with the inner product may make the LS approximation less sensitive to data distribution. By permitting and controlling the fitting errors, the number of basis functions may be significantly reduced, Such a process is known as least-squares regression in statistics where the data are considered from random variables. We elaborate on the LS data fitting problem and solution. Given data (x i, y i ), x i [a, b], i = 1 : m. Let F be a family of fitting functions and a subspace of C[a, b]. Then, the LS curve fitting problem becomes with min y f d 2, f F y = [y 1,, y m ] t, f d = [f(x 1 ),, f(x m )] t R m. Notice that the fitting function f(x) is continuous, the LS approximation is determined via comparing f d = [f(x i )] R m against y = [y i = y(x i )] R m. Let {f k, k = 1 : n} be a basis of F. Then the LS approximation is in the subspace spanned by the columns B = [b ij ] = [f j (x i )] R m n, and takes the more specific form min a y B a 2. The extreme case with zero tolerance on errors recovers the interpolation problem B a = y. Although f k are linearly independent, the columns of B are not necessarily linearly independent, depending on the distribution 40

of nodes x i. With the same set of representing functions f k, the matrix changes with nodes x i. While the interpolation solution may not exist, the LS solution always exists. We proceed with the computation for the LS curve fitting. 1. For the case u, v = u t v, The normal equations become B t B a = B t y. When B is square and nonsingular, the normal equations are viewed a symmetrized, mathematically equivalent version of Ba = y. But the condition number for the numerlcal solution is squared. Let Q be any matrix such that span(q) = span(b). Then the LS solution satisfies the generalized equations Q t B a = Q t y. In particular, let B = QRP be a QRP factorization of B, where Q has orthogonal columns, R is upper triangular, and P is a permutation matrix. With such a matrix Q, the above system is essentially triangular. When B is square and nonsingular, one uses the QRP factorization to transform the system Ba = y into an equivalent triangular one, and the condition number remains the same, unlike converting to the normal equations. 41

2. In general, u, v = u t W v, where W is a weight matrix (symmetric, positive-definite), the normal equation become B t W B a = B t W y. And the generalized equations are Q t W B a = Q t W y, where span(q) = span(b). There exists Q so that Q t W B is triangular. 42