Model Fitting CURM Background Material, Fall 014 Dr. Doreen De Leon 1 Introduction Given a set of data points, we often want to fit a selected odel or type to the data (e.g., we suspect an exponential behavior), and then choose the ost appropriate odel by deterining which odel fits best. Before discussing criteria on which to base curve-fitting decisions, we first will look at sources and types of errors. Forulation errrors. These errors result fro the assuption that certain variables are negligible or fro siplifications in describing interrelationships aong the variables in the various subodels constructed as part of the odeling process. Truncation errors. These errors are attributable to nuerical ethods used to solve a atheatical proble. An exaple is the use of a Taylor polynoial to approxiate a function. Round-off errors. These errors are caused by using finite-digit achines for coputations. Each nuber can be represented in a coputer with a nuber having only finite precision. Thus, each arithetic operation that is perfored, each with its own roundoff, causes an accuulated round-off error. Such an error can significantly alter the results of a odel s solution. Measureent errors. These errors are caused by iprecision in the data collection. This ay result fro huan error in recording data or fro liitations in the accuracy of easuring equipent. Analytic Methods of Model Fitting There are actually several ethods for fitting curves to a set of data points, based on different sets of criterion. We will briefly exaine soe of these, before focusing on the ost coon. The ajority of the aterial in this handout is taken directly fro A First Course in Matheatical Modeling by Frank Giordano, et al. 1
Note that this ay also be found in Section 4.1 of the textbook. Least-Squares Criterion Currently, the ost frequently used curve-fitting criterion is the least-squares criterion. In this case, given soe function type y = f(x) and a set of data points (x i, y i ), i = 1,,...,, we seek to iniize the su y i f(x i ). (1) In other words, we are iniizing the standard deviation of the function f(x) fro the data set. One of the advantages of the least-squares ethod (also known as regression) is that the resulting optiization proble can be solved only using the calculus of several variables. Miniizing the Su of the Absolute Deviations In this case, given soe function type y = f(x) and a set of data points (x i, y i ), i = 1,,...,, we seek to iniize y i f(x i ). Again, in general, this presents soe probles because, in order to solve, we ust differentiate the above su with respect to the paraeters in f(x i ) to find the critical points. However, because of the absolute values, the derivatives are not continuous. Chebyshev Approxiation Criterion Given soe function type y = f(x) and a set of data points (x i, y i ), i = 1,,...,, iniize the largest absolute deviation y i f(x i ) over the entire set. The difficulty with the Chebyshev criterion is that it is often difficult to apply in practice, because application of the criterion results in an optiization proble that ay require advanced atheatical procedures or nuerical algoriths to solve. Relating the Criteria The geoetric interpretations of the three curve-fitting criteria help provide a qualitative description coparing the criteria. Miniizing the su of the absolute deviations tends
to treat each data point with equal weight and to average the deviations. The Chevyshev criterion gives ore weight to a single point potentially having a large deviation. The leastsquares criterion, in ters of weighting individual points, is soewhat in between both of these criteria. The following also ay be found in Section 4.3 of the tetbook. Now, we will copare the Chebyshev and least-squares approaches. Suppose the Chebyshev criterion is applied and the resulting optiization proble solved to give the function f 1 (x). The absolute deviations resulting fro the fit are defined as follows: y i f 1 (x i ) = c i, i = 1,,...,. Define c ax as the largest of the absolute deviations c i. Since the paraeters of f 1 (x) are deterined so as to iniize c ax, it is the inial largest absolute deviation obtainable. Suppose, instead, that the least-squares criterion is applied and the resulting optiization proble solved to yield the function f (x). The absolute deviations resulting fro this fit are given by y i f (x i ) = d i, i = 1,,...,. Define d ax as the largest of the absolute deviations d i. Clearly, d ax c ax. Since the su of the squares of d i is the sallest such su obtainable, we know that d 1 + d + + d c 1 + c + + c. However, c i c ax for i = 1,,...,. Therefore, or Thus, D = d 1 + d + + d c ax, d 1 + d + + d D c ax d ax. c ax. This gives an effective bound on the axiu absolute deviation c ax. So, if there is a considerable difference between D and d ax, it ight be better to apply the Chebyshev criterion. 3 Obtaining a Least-Squares Fit (or Regression) Linear Regression Suppose a odel of the for y = ax + b is expected, and it has been decided to use the data points (x i, y i ), i = 1,,..., to estiate a and b. Applying the least-squares criterion in equation (1), we find that we ust iniize E = [y i f(x i )] = (y i ax i b). 3
In order to iniize E, we need to copute and set to 0 the partial derivatives of E with respect to both a and b: E a = (y i ax i b) x i = 0 E b = (y i ax i b) = 0 Rewriting these equations, we obtain the following syste of two equations in two unknowns a x i + b x i = a x i + b = x i y i y i, which ay easily be solved to yield and a = x i y i x i yi x i ( x i ) () b = x i yi x i y i xi x i ( x i ) (3) Maple can be used to deterine the linear least-squares fit for any set of data. Exaple: The following table shows the average age versus height for children. The data is copied fro An Introduction to the Matheatics of Biology, with Coputer Algebra Models by Yeargers, Shonkwiler, and Herod. A plot of the data (height vs. age) reveals that the height (c) 75 9 108 11 130 14 155 age 1 3 5 7 9 11 13 data appears to be linear in nature, so we will perfor a linear regression to deterine the least-squares best linear fit, letting x i represent the age and y i represent the height. Working out the necessary sus, we find that xi = 49 x i = 455 yi = 83 xi y i = 6485. Applying the forulas in () and (3), we obtain a = 6.4643 and b = 7.314. Therefore, the data is approxiated by the curve y = 6.4643x + 7.314. See also the Maple exaple regression.w under Linear Regression. 4
4 Plotting the Residuals for a Least Squares Fit We plot the residuals in the y-axis and the independent variable on the x-axis. The deviations should be randoly distributed and contained in a reasonably sall band that is consistent with the accuracy required by the odel. If the residual is excessively large, this eans that the data point in question should be exained to discover the cause of the large deviation (i.e., did we iss an outlier or is soething else going on). A pattern or trend in the residuals indicates that a predictable effect reains to be odeled, and the nature of hte pattern gives a hint as to how to refine the odel. If the residual is increasing in absolute value, then that is an indication that our odel is unacceptable. We will use a Maple exaple to illustrate this. 5 Fitting a Power Curve Suppose we think that the curve will be of the for y = Ax N, where we know what N is (usually unlikely). Applying the least-squares ethod to fit the curve to a set of data points requires iniization of E = [y i f(x i )] = ( yi Ax N i ). A necessary condition for optiality is that the derivative of E with respect to A equals 0, giving de da = x N i (y i Ax n i ) = 0, sor A = x N i y i. x N i However, it is not likely that we know the value of N in advance. What if we do not know N? In other words, now assue that we seek a least-squares estiate of the for y = ax n, where both a and n are paraeters to be deterined. In this case, we will first need to transfor the function by taking the logarith of both sides to obtain ln y = ln a + n ln x. (4) Then, we siply transfor the data by taking the natural log of both x i and y i, then apply linear regression to obtain ln a and n. The paraeter a is then found by a = e ln a. Exaple: The following table shows the ideal weights for ediu built ales. The data is copied fro An Introduction to the Matheatics of Biology, with Coputer Algebra Models by Yeargers, Shonkwiler, and Herod. This exaple is worked out on the Maple worksheet, regression.w. 5
Height (in) Weight (lb) 6 18 63 131 64 135 65 139 66 14 67 146 68 150 69 154 70 158 71 16 7 167 73 17 74 177 First, we try a linear least-squares fit. As we can see fro the plot of the residuals, the residual is increasing as the height increases. Therefore, this odel does not see to be acceptable. However, since we know that in any geoetric solids, the volue changes with the cube of the height, we will try to find a cubic fit for the data. The results fro the cubic fit are uch better: the residual is not large, and it decreases as the height increases. We would like to know if we can do better. So, we will try to find a fit of the for weight = A(height 60) n. Ideally, n is an integer. However, we discover that n 0.168. Since the atheatics behind our approach does not require n to be an integer, this is okay. If we wish to ake n a rational nuber, then the nearest rational nuber to the value of n we obtained is 1. So, we also 6 look at the results with this value of n. 6 Multiple Regression In any real probles, there are ore than two variables to be considered. For exaple, (1) The pressure of a gas depends on the volue V and the teperature T : P V = kt. () The relationship between the surface area of a huan as a function of height and weight: SA = c(wt) α (ht) β. (3) Deterination of percent body fat fro body ass index and skin fold: % body fat = a(bmi) + b(skinfold) + c. 6
Look at (), for exaple. Taking the natural log of both sides gives ln(sa) = a ln(wt) + b ln(ht) + ln c, so we obtain an expression siilar to that in (3). In general, we have a relation of the for Y = a 1 X 1 + a X + + a r X r. Suppose that we have n data points, (X 1,i, X,i,..., X r,i, Y i ), i = 1,,..., n. Our goal is to iniize the least-squared error: E = (Y i (a 1 X 1,i + a X,i + + a r X r,i )) Differentiating with respect to each paraeter a i and setting the derivatives to zero gives the following syste of equations a 1 a 1 X 1,i X 1,i + + a r X r,i X 1,i + + a r X 1,i X r,i =. X r,i X r,i = X 1,i Y i X r,i Y i. Solve by writing the equations in atrix for. algebra systes, such as Maple. This is easily done in various coputer Maple code for exaples is in the file regression.w. 7