Least squares regression - PDF Free Download

Curve Fitting Least squares regression Interpolation

Two categories of curve fitting. 1. Linear least squares regression, determining the straight line that best fits data points. 2. Interpolation, determining a curved line that goes through a series of points.

Linear Least Squares Regression (to find the formula of a straight line that best fits data points) e i The vertical distance, e i, between data point i and a straight line is, e = y a a x i i 0 1 i The goal is to minimize n i= 1 e i

Instead of minimizing the sum of the distances, minimizing the sum of the squares of the distances (the residuals) has the sameeffecteffect and is easier mathematically. n n 2 r = i = i 0 1 i i= 1 i= 1 ( ) 2 S e y a a x How can we find the values of a 0 and a 1 to minimize S r?

In the case of a function f = f(x), we can find the minimum by setting the derivative of f with respect to x equal to zero, df dx = 0 But S r is a function of two variables, a 0 and a 1, S r (a 0,a 1 ); therefore, S a r Sr = 0& = 0 a 0 1

Evaluation of derivatives, S n = 2 y a a x = 0 r ( ) i 0 1 i a 0 i= 1 and n Sr = 2 ( y ) i a0 a1xi xi = 0 a 1 i= 1 y i a 0 a1xi 0 = and yx ax ax 2 i i 0 i 1 i = 0

Rearrange, + = na a x y 0 1 i i y a x + a x = y x 2 0 i 1 i i i 2 The values, n, x, y, x, y x i i i i i can be determined dfrom the data.

Solve for a 0 and a 1. 0 1 a n x y x y = ( ) i i i i 1 2 2 n xi xi a0 = y a1x where, x y = xi n y i = n

Example (using MATLAB for calculations) >> x = [0 2 4 6 9 11 12 15 17 19]; >> y = [5 6 7 6 9 8 7 10 12 12]; >> sumxy = sum(x.*y) sumxy = 911 >> sumx = sum(x); >> sumy = sum(y); >> sumx2 = sum(x.^2); >> n = size(x,2) n = 10 >> a1 = (n*sumxy sumx*sumy) / (n*sumx2 sumx^2) a1 = 0.3525 >> xmean = sumx/n; >> ymean = sumy/n; >> a0 = ymean a1*xmean a0 = 4.8515 >> xx = linspace(0,20); >> yy = a1*xx + a0; >> plot(xx,yy,x,y,'o )

Non linear least squares residuals. Some non linear formulas for y = y(x) can be linearized to linear formulas. β 1 x y = α e ln y = β x+ ln α 1 1 1 β2 y x y β x = α ln = β ln + lnα 2 2 2 y x 1 β + x β 1 1 3 3 = α 3 = = + β3+ x y α3x α3 x α3

Example To Find: Fit an exponential model to, x 04 0.4 08 0.8 12 1.2 16 1.6 2 23 2.3 y 800 975 1500 1950 2900 3600 Solution: Linearize the exponential equation, β1x y e ln y x = α = β + lnα 1 1 1

Calculations using MATLAB β1x y e ln y x = α = β + lnα 1 1 1 >> x = [.4 [4.8 1.2 12162 1.6 2 23]; 2.3]; >> y = [800 975 1500 1950 2900 3600]; >> lny = log(y); >> p = polyfit(x,lny,1); >> beta1 = p(1) beta1 = 0.8187 >> alpha1 = exp(p(2)) alpha1 = 546.5909 >> xx = linspace(.2,2.4); >> yy = alpha1*exp(beta1*xx); >> plot(xx,yy,x,y,'h') >> y = 546.6e 0.819x

Using Excel for linear regressions, 4000 3500 3000 2500 y = 546.59e 0.8187x R² = 0.9933 y 2000 y 1500 Expon. (y) 1000 500 0 0 0.5 1 1.5 2 2.5

Interpolation Interpolation with polynomials If n data points are given, a polynomial of degree n 1 can be determined that passes through all the points. >> x = [0 1 2 3]; >> y = [0 2 1 4]; >> p = polyfit(x,y,3) p = 2.1667 9.0000 8.8333 0.0000 >> xx = linspace(0,3); >> yy = polyval(p,xx); >> plot(x,y,'o',xx,yy) 3 2 y = 2.1667 x 9.0000 x + 8.8333 8333x

If the n data points are know to represent a polynomial, then using polyfit with degree n 1 is suitable for interpolation. If, however, the data points do not come from a polynomial function then higher order polynomials will not, in general, provide a good fit.

1 Example using Runge s function f ( x ) = 2 1+ 25x Data points from Runge s function x 1.0000 0.7500 0.5000 0.2500 0 0.2500 0.5000 0.7500 1.0000 y 0.0385 0.0664 0.1379 0.3902 1.0000 0.3902 0.1379 0.0664 0.0385 Use the 9 points to create an 8 th degree polynomial and plot both Runge s function and the polynomial. >> x = linspace( 1,1,9); >> y = 1./(1+25*x.^2); >> p = polyfit(x,y,8); (,y,); >> xx = linspace( 1,1); >> yy = 1./(1+25*xx.^2); >> yyp = polyval(p,xx); >> plot(x,y, y'o',xx,yy, 'g',xx,yyp, 'r') >>

Interpolation using splines Splines are connected piecewise polynomials (linear, quadratic, cubic, quartic, etc.) that go from one data point to the next. In order to determine the coefficients of the In order to determine the coefficients of the polynomials, point location plus continuity of 1 st, 2 nd, etc. derivatives are used.

Consider the splines that connect the points. Spline 1 Point 1 2 Spline 2 Spline 3 Point 2 3 At point 1 2, the location, the slope, and the inflection of both spline 1 and spline 2 must match; likewise, at point 2 3, the same for splines 2 and 3. This information allows the coefficients of the cubic splines to be determined.

Usefulness of splines Splines are more versatile than polynomials for interpolating the regions between data points. Example (Runge s function) >> x = linspace( 1,1,9); >> y = 1./(1+25*x.^2); >> xx = linspace( 1,1); >> yy = 1./(1+25*xx.^2); >> yspline = spline(x,y,xx); >> plot(x,y,'o',xx,yy,'r',xx,yspline,'g') N h h i l i i id bl b Note that the interpolation is considerably better than that with the 8 th order polynomial.

Eample Example using splines for interpolation Consider data for an exponential curve. x 0 0.2500 0.5000 0.7500 1.0000 y 1.0000 1.0157 1.1331 1.5248 2.7183 Determine the y value at x = 0.9 using splines. >> x = linspace(0,1,5); >> y = exp(x.^3); >> yp9 = spline(x,y,0.9) yp9 = 2.1052 >> yactual =exp((0.9)^3) yactual = 2.0730 >>