Numerical Methods Discrete Least Square Approximation Pavel Ludvík, <pavel.ludvik@vsb.cz> March 30, 2016 Department of Mathematics and Descriptive Geometry VŠB-TUO http://homen.vsb.cz/ lud0016/ 1 / 23
Introduction Introduction: Matching a Few Parameters to a Lot of Data. Sometimes we get a lot of data, many observations, and want to fit it to a simple model. 60 50 data interpolation approximation 40 30 20 10 0 4 2 0 2 4 6 8 10 12 14 2 / 23
Introduction Why a Low Dimensional Model? Low dimensional models (e.g. low degree polynomials) are easy to work with, and are quite well behaved (high degree polynomials can be quite oscillatory). 3 / 23
Introduction Why a Low Dimensional Model? Low dimensional models (e.g. low degree polynomials) are easy to work with, and are quite well behaved (high degree polynomials can be quite oscillatory). All measurements are noisy, to some degree. Often, we want to use a large number of measurements in order to " average out" random noise. 3 / 23
Introduction Why a Low Dimensional Model? Low dimensional models (e.g. low degree polynomials) are easy to work with, and are quite well behaved (high degree polynomials can be quite oscillatory). All measurements are noisy, to some degree. Often, we want to use a large number of measurements in order to " average out" random noise. Approximation Theory looks at two problems: [1] Given a data set, find the best fit for a model (i.e. in a class of functions, find the one that best represents the data.) 3 / 23
Introduction Why a Low Dimensional Model? Low dimensional models (e.g. low degree polynomials) are easy to work with, and are quite well behaved (high degree polynomials can be quite oscillatory). All measurements are noisy, to some degree. Often, we want to use a large number of measurements in order to " average out" random noise. Approximation Theory looks at two problems: [1] Given a data set, find the best fit for a model (i.e. in a class of functions, find the one that best represents the data.) [2] Find a simpler model approximating a given function. 3 / 23
Introduction Interpolation: A Bad Idea? We can probably agree that trying to interpolate this data set: 4 / 23
Introduction Interpolation: A Bad Idea? We can probably agree that trying to interpolate this data set: 60 50 data interpolation approximation 40 30 20 10 0 4 2 0 2 4 6 8 10 12 14 4 / 23
Introduction Interpolation: A Bad Idea? We can probably agree that trying to interpolate this data set: 60 50 data interpolation approximation 40 30 20 10 0 4 2 0 2 4 6 8 10 12 14 with a 9th degree polynomial is not the best idea in the world... So what do we do? 4 / 23
Introduction Defining Best Fit the Residual. We are going to relax the requirement that the approximating function must pass through all the data points. Now we need a measurement of how well our approximation fits the data. A definition of best fit. 5 / 23
Introduction Defining Best Fit the Residual. We are going to relax the requirement that the approximating function must pass through all the data points. Now we need a measurement of how well our approximation fits the data. A definition of best fit. If f (x i ) are the measured function values, and a(x i ) are the values of our approximating functions, we can define a function, r(x i ) = f (x i ) a(x i ) which measures the deviation (residual) at x i. Notice that r = {r(x 0 ), r(x 1 ),..., r(x n )} T is a vector. Notation: From now on, f i = f (x i ), a i = a(x i ), and r i = r(x i ). Further, f = {f0, f 1,..., f n } T, ã = {a 0, a 1,..., a n } T, and r = {r 0, r 1,..., r n } T. 5 / 23
Introduction What is the Size of the Residual There are many possible reasonable choices, e.g. The abs-sum of the deviations: E 1 = r i E 1 = r 1 The sum-of-squares of the deviations: E 2 = n r i 2 E 2 = r 2 The largest of the deviations: E = max 0 i n r i E = r In most cases, the sum-of-the-squares version is the easiest to work with. (From now on we will focus on this choice... ) 6 / 23
Discrete Least Squares Approximation We have chosen the sum-of-squares measurement for errors. Lets find the constant that best fits the data, minimize E(C) = (f i C) 2. If C is a minimizer, then E (C ) = 0 (since derivative at a max/min is zero). E (C) = 2(f i C) = 2 f i + 2(n + 1)C, E (C) = 2(n + 1) 7 / 23
Discrete Least Squares Approximation We have chosen the sum-of-squares measurement for errors. Lets find the constant that best fits the data, minimize E(C) = (f i C) 2. If C is a minimizer, then E (C ) = 0 (since derivative at a max/min is zero). E (C) = 2(f i C) = 2 f i + 2(n + 1)C, E (C) = 2(n + 1) hence C = 1 n + 1 f i, it is a min since E (C ) = 2(n + 1) > 0, is the constant that best fits the data. (Note: C is the average.) 7 / 23
Discrete Least Squares: Linear Approximation The form of Least Squares you are most likely to see: Find the Linear Function, p 1 (x) = a 1 + a 1 x, that best fits the data. The error E(a 0, a 1 ) we need to minimize is: E(a 0, a 1 ) = [(a 0 + a 1 x i ) f i ] 2. 8 / 23
Discrete Least Squares: Linear Approximation The form of Least Squares you are most likely to see: Find the Linear Function, p 1 (x) = a 1 + a 1 x, that best fits the data. The error E(a 0, a 1 ) we need to minimize is: E(a 0, a 1 ) = [(a 0 + a 1 x i ) f i ] 2. The first partial derivatives with respect to a 0 and a 1 better be zero at the minimum: E(a 0, a 1 ) = 2 [(a 0 + a 1 x i ) f i ] = 0 a 0 a 1 E(a 0, a 1 ) = 2 x i [(a 0 + a 1 x i ) f i ] = 0. Now, we work on these expressions to get the Normal Equations... 8 / 23
Linear Approximation: The Normal Equations p 1 (x) [(a 0 + a 1 x i ) f i ] = 0 x i [(a 0 + a 1 x i ) f i ] = 0. 9 / 23
Linear Approximation: The Normal Equations p 1 (x) a 0 (n + 1) + a 1 a 0 x i + a 1 x i = x 2 i = f i x i f i Since everything except a 0 and a 1 is known, this is a 2-by-2 system of equations. 9 / 23
Linear Approximation: The Normal Equations p 1 (x) a 0 (n + 1) + a 1 a 0 x i + a 1 x i = x 2 i = f i x i f i Since everything except a 0 and a 1 is known, this is a 2-by-2 system of equations. [ (n + 1) n x ] [ ] [ n i n x a0 n i x i 2 = f ] i a n 1 x if i 9 / 23
Quadratic Model, p 2 (x) For the quadratic polynomial p 2 (x) = a 0 + a 1 x + a 2 x 2, the error is given by E(a 0, a 1, a 2 ) = [a 0 + a 1 x i + a 2 xi 2 f i ] 2 At the minimum (best model) we must have E(a 0, a 1, a 2 ) = 2 [(a 0 + a 1 x i + a 2 xi 2 ) f i ] = 0 a 0 a 1 E(a 0, a 1, a 2 ) = 2 a 2 E(a 0, a 1, a 2 ) = 2 x i [(a 0 + a 1 x i + a 2 x 2 i ) f i ] = 0 x 2 i [(a 0 + a 1 x i + a 2 x 2 i ) f i ] = 0 10 / 23
Quadratic Model: The Normal Equations Similarly for the quadratic polynomial p 2 (x) = a 0 + a 1 x + a 2 x 2, the normal equations are: a 0 (n + 1) + a 1 x i + a 2 xi 2 = f i Note: a 0 x i + a 1 a 0 x 2 i + a 1 x 2 i + a 2 x 3 i + a 2 x 3 i = x 4 i = x i f i x 2 i f i p 2 (x) Even though the model is quadratic, the resulting (normal) equations are linear The model is linear in its parameters, a 0, a 1 and a 2. 11 / 23
The Normal Equations As Matrix Equations We have: a 0 (n + 1) + a 1 a 0 x i + a 1 a 0 x 2 i + a 1 x i + a 2 x 2 i + a 2 x 3 i + a 2 x 2 i = x 3 i = x 4 i = f i x i f i x 2 i f i 12 / 23
The Normal Equations As Matrix Equations We rewrite the Normal Equations as: (n + 1) n x n i x 2 i n x n i x i 2 a n 0 n x i 3 f i a n x i 2 n x i 3 1 = n n x i 4 x if i n a 2 x i 2 f i It is not immediately obvious, but this expression can be written in the form A T Aã = A T f. Where matrix A is very easy to write in terms of x i. 12 / 23
The Polynomial Equations in Matrix Form We can express the m th degree polynomial, p m (x), evaluated at the points x i : a 0 + a 1 x i + a 2 x 2 i + + a m x m i = f i, i = 0,..., n p m (x) as a product of an (n + 1)-by-(m + 1) matrix, A and the (m + 1)-by-1 vector ã and the result is the (n + 1)-by-1 vector f, usually n >> m: 1 x 0 x0 2 x m 0 1 x 1 x1 2 x m a 0 f 0 1 1 x 2 x2 2 x2 m a 1 f 1 1 x 3 x3 2 x3 m a 2 = f 2....... 1 x n xn 2 xn m a m f m 13 / 23
Building a Solvable System from Aã = f We cannot immediately solve the linear system Aã = f when A is a rectangular matrix (n + 1)-by-(m + 1), m n. We can generate a solvable system by multiplying both the left- and right-hand-side by A T, i.e. A T Aã = A T f Now, the matrix A T A is a square (m + 1)-by-(m + 1) matrix, and A T f an (m + 1)-by-1 vector. Let s take a closer look at A T A, and A T f... 14 / 23
Computing A T A 1 x 1 1 1 1 1 0 x0 2 x m 0 x 0 x 1 x 2 x 3 x n 1 x 1 x1 2 x m 1 x0 2 x1 2 x2 2 x3 2 xn 2 1 x 2 x2 2 x m 2 1 x 3 x3 2 x m 3...... x0 m x1 m x2 m x3 m xn m..... 1 x n xn 2 xn m n + 1 = 15 / 23
Computing A T A 1 x 1 1 1 1 1 0 x0 2 x m 0 x 0 x 1 x 2 x 3 x n 1 x 1 x1 2 x m 1 x0 2 x1 2 x2 2 x3 2 xn 2 1 x 2 x2 2 x m 2 1 x 3 x3 2 x m 3...... x0 m x1 m x2 m x3 m xn m..... 1 x n xn 2 xn m n + 1 n x m i = 16 / 23
Computing A T A 1 x 1 1 1 1 1 0 x0 2 x m 0 x 0 x 1 x 2 x 3 x n 1 x 1 x1 2 x m 1 x0 2 x1 2 x2 2 x3 2 xn 2 1 x 2 x2 2 x m 2 1 x 3 x3 2 x m 3...... x0 m x1 m x2 m x3 m xn m..... 1 x n xn 2 xn m n + 1 n x m i = n x i m 17 / 23
Computing A T A 1 x 1 1 1 1 1 0 x0 2 x m 0 x 0 x 1 x 2 x 3 x n 1 x 1 x1 2 x m 1 x0 2 x1 2 x2 2 x3 2 xn 2 1 x 2 x2 2 x m 2 1 x 3 x3 2 x m 3...... x0 m x1 m x2 m x3 m xn m..... 1 x n xn 2 xn m n + 1 n x m i = n x i m n x i 2m 18 / 23
Computing A T A 1 x 1 1 1 1 1 0 x0 2 x m 0 x 0 x 1 x 2 x 3 x n 1 x 1 x1 2 x m 1 x0 2 x1 2 x2 2 x3 2 xn 2 1 x 2 x2 2 x m 2 1 x 3 x3 2 x m 3...... x0 m x1 m x2 m x3 m xn m..... 1 x n xn 2 xn m n + 1 n x 1 i n x m i n = x i 1 n x i 2 n x m+1 i...... n x i m n x m+1 i n x i 2m 19 / 23
Computing A T f 1 1 1 1 1 f 0 x 0 x 1 x 2 x 3 x n f 1 x0 2 x1 2 x2 2 x3 2 xn 2 f 2 =....... x0 m x1 m x2 m x3 m xn m f n Thus, we have recovered the Normal Equations... n f i n x if i n x i 2 f i. n x i m f i 20 / 23
A Simple, Powerful Approach Discrete Least Squares: A Simple, Powerful Method Given the data set ( x, f), where x = {x 0, x 1,..., x n } T and f = {f0, f 1,..., f n } T, we can quickly find the best polynomial fit for any specified polynomial degree! Notation: Let x j be the vector {x j 0, x j 1,..., x j n} T. Example To compute the best fitting polynomial of degree 3, p 3 (x) = a 0 + a 1 x + a 2 x 2 + a 3 x 3, define:.... A = 1 x x 2 x 3, and compute.... ã = (A T A) 1 (A T f) (not necessarily like this.) 21 / 23
A Simple, Powerful Approach But... I do not want to fit a polynomial!!! Fitting an exponential model g(x) = be cx to the given data d, is quite straight-forward. 22 / 23
A Simple, Powerful Approach But... I do not want to fit a polynomial!!! Fitting an exponential model g(x) = be cx to the given data d, is quite straight-forward. First, re-cast the problem as a set of linear equations. We have: be cx i = d i, i = 0,..., n compute the natural logarithm on both sides: ln b+ a 0 + cx i = ln d i a 1 x i =f i 22 / 23
A Simple, Powerful Approach But... I do not want to fit a polynomial!!! Fitting an exponential model g(x) = be cx to the given data d, is quite straight-forward. First, re-cast the problem as a set of linear equations. We have: be cx i = d i, i = 0,..., n compute the natural logarithm on both sides: ln b+ a 0 + cx i = ln d i a 1 x i =f i Now, we can apply a polynomial least squares fit to the problem and once we have (a 0, a 1 ), b = e a 0 and c = a 1. Note This does not give the least squares fit to the original problem!!! (It gives us a pretty good estimate.) 22 / 23
A Simple, Powerful Approach But... That is not a True Least Squares Fit! Note This does not give the least squares fit to the original problem!!! (It gives us a pretty good estimate.) In order to find the true least squares fit we need to know how to find roots and/or minima/maxima of non-linear systems of equations. Feel free to look at Burden-Faires Chapter 10. 23 / 23