Jim Lambers MAT 419/519 Summer Session Lecture 13 Notes

Similar documents
Functions of Several Variables

Numerical Methods. Lecture Notes #08 Discrete Least Square Approximation

Jim Lambers MAT 419/519 Summer Session Lecture 11 Notes

Jim Lambers MAT 460 Fall Semester Lecture 2 Notes

Simple Iteration, cont d

Lesson 9 Exploring Graphs of Quadratic Functions

Logarithmic and Exponential Equations and Change-of-Base

MATH 167: APPLIED LINEAR ALGEBRA Least-Squares

Jim Lambers MAT 610 Summer Session Lecture 1 Notes

Least Squares Regression

Unconstrained Geometric Programming

Jim Lambers MAT 610 Summer Session Lecture 2 Notes

CLASS NOTES Computational Methods for Engineering Applications I Spring 2015

Vectors in Function Spaces

Linear Algebra and Matrix Inversion

CURVE FITTING LEAST SQUARE LINE. Consider the class of linear function of the form. = Ax+ B...(1)

MTH5112 Linear Algebra I MTH5212 Applied Linear Algebra (2017/2018)

Chapter 8 ~ Quadratic Functions and Equations In this chapter you will study... You can use these skills...

Regression and Nonlinear Axes

Matrix operations Linear Algebra with Computer Science Application

Applied Numerical Analysis Homework #3

Math 3191 Applied Linear Algebra

Hermite Interpolation

A Library of Functions

Linear Algebra, 4th day, Thursday 7/1/04 REU Info:

CLASS NOTES Models, Algorithms and Data: Introduction to computing 2018

Lecture 6. Numerical methods. Approximation of functions

Interpolation APPLIED PROBLEMS. Reading Between the Lines FLY ROCKET FLY, FLY ROCKET FLY WHAT IS INTERPOLATION? Figure Interpolation of discrete data.

The Eigenvalue Problem: Perturbation Theory

1 The linear algebra of linear programs (March 15 and 22, 2015)

Lecture 8: Complete Problems for Other Complexity Classes

Math 3191 Applied Linear Algebra

Introduction to Decision Sciences Lecture 6

Approximation theory

Polynomial Form. Factored Form. Perfect Squares

Cubic Splines MATH 375. J. Robert Buchanan. Fall Department of Mathematics. J. Robert Buchanan Cubic Splines

24. x 2 y xy y sec(ln x); 1 e x y 1 cos(ln x), y 2 sin(ln x) 25. y y tan x 26. y 4y sec 2x 28.

College Algebra. Basics to Theory of Equations. Chapter Goals and Assessment. John J. Schiller and Marie A. Wurster. Slide 1

Midterm for Introduction to Numerical Analysis I, AMSC/CMSC 466, on 10/29/2015

1 Matrices and Systems of Linear Equations. a 1n a 2n

Input: A set (x i -yy i ) data. Output: Function value at arbitrary point x. What for x = 1.2?

Exploring and Generalizing Transformations of Functions

Gaussian Elimination and Back Substitution

Matrix Arithmetic. j=1

Chapter 1: Systems of Linear Equations and Matrices

Calculus (Math 1A) Lecture 4

Section 7.1 Quadratic Equations

Jim Lambers MAT 169 Fall Semester Lecture 6 Notes. a n. n=1. S = lim s k = lim. n=1. n=1

1 Matrices and Systems of Linear Equations

Lecture # 1 - Introduction

Calculus (Math 1A) Lecture 4

Math 24 Spring 2012 Questions (mostly) from the Textbook

5.1 Least-Squares Line

Interpolating Accuracy without underlying f (x)

Polynomial Interpolation Part II

APPENDIX : PARTIAL FRACTIONS

Lecture 1 INF-MAT3350/ : Some Tridiagonal Matrix Problems

We could express the left side as a sum of vectors and obtain the Vector Form of a Linear System: a 12 a x n. a m2

Linear Systems and Matrices

Linear algebra I Homework #1 due Thursday, Oct Show that the diagonals of a square are orthogonal to one another.

MAT 419 Lecture Notes Transcribed by Eowyn Cenek 6/1/2012

Math 101 Study Session Spring 2016 Test 4 Chapter 10, Chapter 11 Chapter 12 Section 1, and Chapter 12 Section 2

MA2501 Numerical Methods Spring 2015

Taylor polynomials. 1. Introduction. 2. Linear approximation.

Math 2331 Linear Algebra

MATH 167: APPLIED LINEAR ALGEBRA Chapter 3

Algebra. Mathematics Help Sheet. The University of Sydney Business School

Applications of the Maximum Principle

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Linear algebra I Homework #1 due Thursday, Oct. 5

Scientific Computing

Kernels and the Kernel Trick. Machine Learning Fall 2017

Numerical Methods of Approximation

Mathematics I. Exercises with solutions. 1 Linear Algebra. Vectors and Matrices Let , C = , B = A = Determine the following matrices:

APPENDIX A. Background Mathematics. A.1 Linear Algebra. Vector algebra. Let x denote the n-dimensional column vector with components x 1 x 2.

Equations in Quadratic Form

a factors The exponential 0 is a special case. If b is any nonzero real number, then

Tangent Planes, Linear Approximations and Differentiability

Preliminaries Lectures. Dr. Abdulla Eid. Department of Mathematics MATHS 101: Calculus I

Section 5.8 Regression/Least Squares Approximation

Math 110, Spring 2015: Midterm Solutions

Polynomial Form. Factored Form. Perfect Squares

Tropical Polynomials

chapter 5 INTRODUCTION TO MATRIX ALGEBRA GOALS 5.1 Basic Definitions

Polynomial Functions and Their Graphs

Internet Mat117 Formulas and Concepts. d(a, B) = (x 2 x 1 ) 2 + (y 2 y 1 ) 2., y 1 + y 2. ( x 1 + x 2 2

MATH 22A: LINEAR ALGEBRA Chapter 4

Section 4.2. Types of Differentiation

MAT 107 College Algebra Fall 2013 Name. Final Exam, Version X

A = 3 1. We conclude that the algebraic multiplicity of the eigenvalues are both one, that is,

Inverses. Stephen Boyd. EE103 Stanford University. October 28, 2017

Taylor Series and Numerical Approximations

1111: Linear Algebra I

1 Inner Product and Orthogonality

REU 2007 Apprentice Class Lecture 8

Linear Algebra Practice Problems

i x i y i

CSL361 Problem set 4: Basic linear algebra

Constructions with ruler and compass.

Exam 2. Average: 85.6 Median: 87.0 Maximum: Minimum: 55.0 Standard Deviation: Numerical Methods Fall 2011 Lecture 20

Transcription:

Jim Lambers MAT 419/519 Summer Session 2011-12 Lecture 13 Notes These notes correspond to Section 4.1 in the text. Least Squares Fit One of the most fundamental problems in science and engineering is data fitting constructing a function that, in some sense, conforms to given data points. One type of data-fitting technique is interpolation. Interpolation techniques, of any kind, construct functions that agree exactly with the data. That is, given points (x 1, y 1 ), (x 2, y 2 ),..., (x m, y m ), interpolation yields a function f(x) such that f(x i ) = y i for i = 1, 2,..., m. However, fitting the data exactly may not be the best approach to describing the data with a function. High-degree polynomial interpolation can yield oscillatory functions that behave very differently than a smooth function from which the data is obtained. Also, it may be pointless to try to fit data exactly, for if it is obtained by previous measurements or other computations, it may be erroneous. Therefore, we consider another notion of what constitutes a best fit of given data by a function. One alternative approach to data fitting is to solve the minimax problem, which is the problem of finding a function f(x) of a given form for which max f(x i) y i 1 i n is minimized. However, this is a very difficult problem to solve. Another approach is to minimize the total absolute deviation of f(x) from the data. That is, we seek a function f(x) of a given form for which f(x i ) y i is minimized. However, we cannot apply standard minimization techniques to this function, because, like the absolute value function that it employs, it is not differentiable. This defect is overcome by considering the problem of finding f(x) of a given form for which [f(x i ) y i ] 2 is minimized. This is known as the least squares problem. We will first show how this problem is solved for the case where f(x) is a linear function of the form f(x) = a 1 x + a 0, and then generalize this solution to other types of functions. 1

When f(x) is linear, the least squares problem is the problem of finding constants a 0 and a 1 such that the function E(a 0, a 1 ) = (a 1 x i + a 0 y i ) 2 is minimized. In order to minimize this function of a 0 and a 1, we must compute its partial derivatives with respect to a 0 and a 1. This yields E E = 2(a 1 x i + a 0 y i ), = 2(a 1 x i + a 0 y i )x i. a 0 a 1 At a minimum, both of these partial derivatives must be equal to zero. This yields the system of linear equations ( m ) ma 0 + x i a 1 = y i, ( m ) ( m x i a 0 + These equations are called the normal equations. Using the formula for the inverse of a 2 2 matrix, [ ] 1 [ a b 1 = c d ad bc we obtain the solutions a 0 = x 2 i ) a 1 = d c x i y i. b a ], ( m ) x2 i ( m y i) ( m x i) ( m x iy i ) m m x2 i ( m x i) 2, a 1 = m m x iy i ( m x i) ( m y i) m m x2 i ( m x i) 2. Example We wish to find the linear function y = a 1 x + a 0 that best approximates the data shown in Table 1, in the least-squares sense. Using the summations x i = 56.2933, x 2 i = 380.5426, y i = 73.8373, x i y i = 485.9487, we obtain a 0 = a 1 = 380.5426 73.8373 56.2933 485.9487 10 380.5426 56.2933 2 = 742.5703 636.4906 = 1.1667, 10 485.9487 56.2933 73.8373 10 380.5426 56.2933 2 = 702.9438 636.4906 = 1.1044. 2

i x i y i 1 2.0774 3.3123 2 2.3049 3.8982 3 3.0125 4.6500 4 4.7092 6.5576 5 5.5016 7.5173 6 5.8704 7.0415 7 6.2248 7.7497 8 8.4431 11.0451 9 8.7594 9.8179 10 9.3900 12.2477 Table 1: Data points (x i, y i ), for i = 1, 2,..., 10, to be fit by a linear function We conclude that the linear function that best fits this data in the least-squares sense is y = 1.1044x + 1.1667. The data, and this function, are shown in Figure 1. It is interesting to note that if we define the m 2 matrix A, the 2-vector a, and the m-vector y by 1 x 1 y 1 1 x 2 [ ] A =.., a = a0 y 2, y = a 1., 1 x m y m then a is the solution to the system of equations A T Aa = A T y. These equations are the normal equations defined earlier, written in matrix-vector form. They arise from the problem of finding the vector a such that Aa y is minimized, where, for any vector u, u is the magnitude, or length, of u. This magnitude is equivalent to the square root of the expression we originally intended to minimize, (a 1 x i + a 0 y i ) 2, 3

Figure 1: Data points (x i, y i ) (circles) and least-squares line (solid line) but we will see that the normal equations also characterize the solution a, an n-vector, to the more general linear least squares problem of minimizing Aa y for any matrix A that is m n, where m n, and whose columns are linearly independent. We now consider the problem of finding a polynomial of degree n that gives the best leastsquares fit. As before, let (x 1, y 1 ), (x 2, y 2 ),..., (x m, y m ) be given data points that need to be approximated by a polynomial of degree n. We assume that n < m 1, for otherwise, we can use polynomial interpolation to fit the points exactly. Let the least-squares polynomial have the form n p n (x) = a j x j. Our goal is to minimize the sum of squares of the deviations in p n (x) from each y-value, 2 n E(a) = [p n (x i ) y i ] 2 = a j x j i y i, 4 j=0 j=0

where a is a column vector of the unknown coefficients of p n (x), a 0 a 1 a =.. a n Differentiating this function with respect to each a k yields E n = 2 a j x j i a y i x k i, k = 0, 1,..., n. k j=0 Setting each of these partial derivatives equal to zero yields the system of equations ( n m ) a j = x k i y i, k = 0, 1,..., n. j=0 x j+k i These are the normal equations. They are a generalization of the normal equations previously defined for the linear case, where n = 1. Solving this system yields the coefficients {a j } n j=0 of the least-squares polynomial p n (x). As in the linear case, the normal equations can be written in matrix-vector form A T Aa = A T y, where 1 x 0 x 2 0 x n 0 1 x 1 x 2 1 x n a 0 y 1 1 A = 1 x 2 x 2 2 x n a 1 2, a =........, y = y 2.. 1 x m x 2 m x n a n y n m The normal equations equations can be used to compute the coefficients of any linear combination of functions {φ j (x)} n j=0 that best fits data in the least-squares sense, provided that these functions are linearly independent. In this general case, the entries of the matrix A are given by a ij = φ i (x j ), for i = 1, 2,..., m and j = 0, 1,..., n. Example We wish to find the quadratic function y = a 2 x 2 + a 1 x + a 0 that best approximates the data shown in Table 2, in the least-squares sense. By defining 1 x 1 x 2 1 y 1 1 x 2 x 2 a 0 2 A =..., a = a 1 y 2, y = a., 2 1 x 10 x 2 10 y 10 5

i x i y i 1 2.0774 2.7212 2 2.3049 3.7798 3 3.0125 4.8774 4 4.7092 6.6596 5 5.5016 10.5966 6 5.8704 9.8786 7 6.2248 10.5232 8 8.4431 23.3574 9 8.7594 24.0510 10 9.3900 27.4827 Table 2: Data points (x i, y i ), for i = 1, 2,..., 10, to be fit by a quadratic function and solving the normal equations we obtain the coefficients A T Aa = A T y, c 0 = 4.7681, c 1 = 1.5193, c 2 = 0.4251, and conclude that the quadratic function that best fits this data in the least-squares sense is y = 0.4251x 2 1.5193x + 4.7681. The data, and this function, are shown in Figure 2. Least-squares fitting can also be used to fit data with functions that are not linear combinations of functions such as polynomials. Suppose we believe that given data points can best be matched to an exponential function of the form y = be ax, where the constants a and b are unknown. Taking the natural logarithm of both sides of this equation yields ln y = ln b + ax. If we define z = ln y and c = ln b, then the problem of fitting the original data points {(x i, y i )} m with an exponential function is transformed into the problem of fitting the data points {(x i, z i )} m with a linear function of the form c + ax, for unknown constants a and c. Similarly, suppose the given data is believed to approximately conform to a function of the form y = bx a, where the constants a and b are unknown. Taking the natural logarithm of both sides of this equation yields ln y = ln b + a ln x. 6

Figure 2: Data points (x i, y i ) (circles) and quadratic least-squares fit (solid curve) If we define z = ln y, c = ln b and w = ln x, then the problem of fitting the original data points {(x i, y i )} m with a constant times a power of x is transformed into the problem of fitting the data points {(w i, z i )} m with a linear function of the form c + aw, for unknown constants a and c. Example We wish to find the exponential function y = be ax that best approximates the data shown in Table 3, in the least-squares sense. By defining 1 x 1 z 1 1 x 2 [ ] c A =.., c = z 2, z = a., 1 x 5 z 5 where c = ln b and z i = ln y i for i = 1, 2,..., 5, and solving the normal equations we obtain the coefficients A T Ac = A T z, a = 0.4040, b = e c = e 0.2652 = 0.7670, 7

i x i y i 1 2.0774 1.4509 2 2.3049 2.8462 3 3.0125 2.1536 4 4.7092 4.7438 5 5.5016 7.7260 Table 3: Data points (x i, y i ), for i = 1, 2,..., 5, to be fit by an exponential function and conclude that the exponential function that best fits this data in the least-squares sense is y = 0.7670e 0.4040x. The data, and this function, are shown in Figure 3. It can be seen from the preceding discussion and examples that the normal equations can be used to solve any problem that requires finding the vector x R n that minimizes b Ax, where b R m, m n, and A is an m n matrix with linearly independent columns, regardless of the interpretation of these columns. To see this, we define the function ϕ(x) = b Ax 2, x R n. Then, it can be shown through differentiation that ϕ(x) = 2(A T Ax A T b), H ϕ (x) = A T A. If x 0, then Ax 0 because A has linearly independent columns. It follows that x A T Ax = (Ax) Ax = Ax 2 > 0, so H ϕ (x) is positive definite on R n. This leads to the following theorem. Theorem Let A be an m n matrix with linearly independent columns, and let b R m. Then the vector x defined by x = (A T A) 1 A T b, that solves the normal equations A T Ax = A T b, is the strict global minimizer of b Ax, x R n. 8

Figure 3: Data points (x i, y i ) (circles) and exponential least-squares fit (solid curve) The matrix A + = (A T A) 1 A T is called the pseudo-inverse, or generalized inverse, of A. When A is a square, invertible matrix, then A + = A 1. Otherwise, A + is the matrix that, as closely as possible, serves as an inverse of A. It should be noted that the condition that A has linearly independent columns is essential, so that A T A is invertible. Exercises 1. Chapter 4, Exercise 1 2. Chapter 4, Exercise 4 3. Chapter 4, Exercise 7 4. Chapter 4, Exercise 10 9