Topic 5a Introduction to Curve Fitting & Linear Regression

/7/08 Course Instructor Dr. Rayond C. Rup Oice: A 337 Phone: (95) 747 6958 E ail: rcrup@utep.edu opic 5a Introduction to Curve Fitting & Linear Regression EE 4386/530 Coputational ethods in EE Outline Introduction Statistics o Data Sets Best Fit ethods Linear Regression (ugly ath) Linear Least Squares (clean ath)

/7/08 Introduction 3 What is Curve Fitting? Curve itting is siply itting an analytical equation to a set o easured data. data saple it curve x ABe xc D Curve itting deterines the values o A, B, C, and D so that (x) best represents the given data. 4

/7/08 Why Fit Data to a Curve? Estiate data between discrete values (interpolation) Find a axiu or iniu. Deriving inite dierence approxiations. Fit easured data to an analytical equation to extract eaningul paraeters. Reove noise ro a unction. Observe and quantiy general trends. 5 wo Categories o Curve Fitting Best Fit easured data has noise so the curve does not attept to intercept every point. Exact Fit Data saples are assued to be exact and the curve is orced to pass through each one. Linear regression (ugly ath) Linear least squares (clean ath) Nonlinear regression (oderate ath) Fitting to polynoials 6 3

/7/08 Statistics o Data Sets 7 Arithetic ean I we had to coe up with a single nuber that be represents an entire set o data, the arithetic ean would probably be it. avg 8 4

/7/08 Geoetric ean he geoetric ean is deined as g he arithetic ean tends to suppress the signiicance o outlying data saples. With the geoetric ean, even a single sall value aong any large values can doinate the ean. his is useul in optiizations where ultiple paraeters ust be axiized at the sae tie and it is not acceptable to have any one o the low. 9 Variance & Standard Deviation Standard Deviation he standard deviation is a easure o the spread o the data about the ean. It is convenient because it shares the sae units as the data. avg Variance v Variance is used ore coonly in calculations, but carries the sae inoration as the standard deviation. avg v 0 5

/7/08 Coeicient o Variation he coeicient o variation (CV) is the standard deviation noralized to the ean. hink o it as relative standard deviation. CV avg Linear Regression (Best Fit, Ugly ath) 6

/7/08 Goal o Linear Regression he goal o linear regression is to it a straight line to a set o easured data that has noise. x, y, x, y,, x, y y a a x 0 3 Stateent o Proble Given a set o easured data points: (x,y ), (x,y ),, (x,y ), we write our equation o the line or each point. y a a x 0 y a a x 0 y a a x 0 e e e o ake these equations correct, we ust introduce an error ter e called the residual. We wish to deterine values o a 0 and a such that the residual ters e are as sall as possible. 4 7

/7/08 Criteria or Best Fit We ust deine a single quantity that tells us how good the line its our set o data. Guess # Su o Residuals E e his does not work because negative and positive residuals can cancel and islead the overall criteria to think there is no error. Guess # Su o agnitude o Residuals E e his does not work because is does not lead to a unique best it. Guess #3 Su o Squares o Residuals E e his works and leads to a unique solution. 5 Equation or Criterion Our line equation or the th saple is y a a x e 0 Solving this or the residual e gives e y a ax 0 his is the value o y o our line at point x. his is our easured value o y. Fro this, we can write our criterion as e easured, line, a0 x E y y y a 6 8

/7/08 Least Squares Fit We wish to iniize the error criterion E. We can identiy inius when the irst order derivative is zero. E E 0 and 0 a a 0 We seek the values o a 0 and a that satisy these equations. his approach is solving the proble by least squares (we are iniizing the squares o the residuals). 7 he Fun ath Step Dierential E with respect to each o the unknowns. E a y a0 ax a 0 0 y a0 ax a 0 y a a x 0 y a a x 0 E y a0 ax a a y a0 ax a y a a x x 0 y a a x x 0 8 9

/7/08 he Fun ath Step We set the derivatives to zero to ind the iniu o E. E 0 a 0 y a a x 0 y a a x 0 y a a x 0 E 0 a y a a x x 0 y a a x x 0 yx a0x ax 9 he Fun ath Step 3 We write these as two siultaneous equations. hese are called the noral equations. E E 0 0 a0 a y a a x y a0 axx 0 y a0 ax y a0 ax y a0 axx yx a0x ax a a x y 0 0 a x a x y x 0 0

/7/08 he Fun ath Step 4 he noral equations are solved siultaneously and the solution is a y a x 0 avg avg x y avg avg x y x y x y x x a Yikes! here has to be an easier way! Linear Least Squares (Best Fit, Clean ath)

/7/08 Stateent o Proble We wish to it a set o easured data points to a curve containing N + ters: a z a z a z a z 0 0 N N z a n n easured value paraeters ro which is evaluated coeicients or the curve it 3 Forulation o atrix Equation We start by writing the unction or each o our easureents. We also include the residual ter. a0z0,az, anzn,e a0z0, az, anzn, e a z a z a z e 0 0,, N N, his large set o equations is put into atrix or. z z z e 0,, N, z 0, z, z N, a 0 e z 3 0,3 z,3 z N,3 a e 3 z 0, z, z N, a N e z 0, z, zn, e Zae or Zae 4

/7/08 Forulation o Solution by Least Squares ( o 4) Step Solve atrix equation or e. Zae e Za Step Calculate the error criterion E ro e. E Step 3 We substitute our equation or e ro Step into our equation or E ro Step. Za E e e Za e e e 3 e e e e3 e e e e ee 5 Forulation o Solution by Least Squares ( o 4) Step 4 Our new atrix equation is algebraically anipulated as ollows in order to ake it easier to ind its irst order derivative. Za Za original equation E a Z Za distribute the transpose Zaa Z a Z Za hese are scalars and transposes o each other so they are equal. expand equation a Z a Z Za cobine ters 6 3

/7/08 Forulation o Solution by Least Squares (3 o 4) Step 5 We dierential E with respect to a. We wish to deterine a that iniizes E. We can do this using the irst derivative rule. E a Z a Z Za E a Z a Z Za substitute in expression or E a a a is not a unction o a az a azza a Z ZZa inish dierentiation 7 Forulation o Solution by Least Squares (4 o 4) Step 6 We ind the value o a that akes the derivative equal to zero. E Z ZZa 0 a ZZZa0 ZZaZ Observe that this is the original equation preultiplied by Z. ZZa Z a Z Z Z 8 4

/7/08 DO NO SIPLIFY FURHER! I we were to sipliy our least squares equation, we would get a Z Z Z Z Z Z Z I his is just our original equation again ( = Za) without the leastsquares approach incorporated. 9 Visualizing Least Squares ( o 3) We are initially given a atrix equation with ore equations than unknowns. 30 5

/7/08 Visualizing Least Squares ( o 3) We preultiply by the transpose o A. 3 Visualizing Least Squares (3 o 3) he atrix equations reduces to the sae nuber o equations as unknowns, which is solvable by any standard algoriths. 3 6

/7/08 Least Squares Algorith Step Construct atrices. Z is essentially just a atrix o the coordinates o the data points. is a colun vector o the easureents. z z z z z z z z z z z z 0,, N, 0,, N, 3 0,3,3 N,3 Z 0,, N, z 0, z, zn, Step Solve or the unknown coeicients a. a Z Z Z Step 3 Extract the coeicients ro a. a0 a a an 33 Least Squares or Solving Ax = b Suppose we wish to solve Ax = b, but we have ore equations than we have unknowns. We ust solve this as a best it because a perect it is ipossible in the presence o noise. We apply least squares by preultipling by A. Ax b Ax b A b A b A A 34 7

/7/08 Exaple ( o 3) Let s it a line to the ollowing set o points. y x y xb 35 Exaple ( o 3) Step Build atrices y xb y x 0.6 3.0 y x b y x.3 4.98 y3 x3 b y3 x3 b Z.57 6.9 y4 x4 b y4 x4.35 8.76 Step Solve by least squares. With soe practice, you will be able to write atrices directly ro easured data. 3.0 0.6 3.0 4.98 6.9 8.76 4.98 3.0 4.98 6.9 8.76.3 ZZxZ 6.9 b.57 8.76.35 58.346 3. 6600 37.5437 3.6600 4.0000 b 5.00 0.3656 b 0.860 36 8

/7/08 Exaple (3 o 3) Step 3 Extract coeicients 0.3656 0.3656 b 0.860 b0.860 Step 4 Plot the result y y xb0.3656x0.860 x 37 9