REGRESSION (Physics 1210 Notes, Partial Modified Appendix A)

REGRESSION (Physics 0 Notes, Partial Modified Appedix A) HOW TO PERFORM A LINEAR REGRESSION Cosider the followig data poits ad their graph (Table I ad Figure ): X Y 0 3 5 3 7 4 9 5 Table : Example Data Y 0 8 6 4 0 0 3 4 5 6 X Figure : Graph of Example Data I this example, the poits are o a perfect straight lie. The formula of a geeral straight lie is Y=a*X+b where a is the slope of the lie ad b is the itercept of that lie with the y-axis. I this example, it is easy to verify that a= ad b= I. I geeral, with the data poits you obtai i your experimets, fidig a ad b is ot so easy. We wat to use a computer to calculate a ad b for us. For this, we use the regressio fuctio of Excel. Whe you are i Excel, type i your data poits as show i Table I. Now, we wat to do a liear regressio o these data poits. That will hopefully give us the value for a ad b. To do this, look i the meu for [Tools], the select [Data Aalysis] ad fially select [Regressio]. You are the faced with a dialogue-box "Regressio". For "iput Y rage", select the Y colum of your data For "iput X rage", select the X colum of your data, the click o "OK". After a few secods, you will see a ew Excel sheet with a overkill of umbers called the "Summary Output". Of this Summary Output, the oly part you eed is: Coefficiets Stadard Error t Stat Itercept 0 65535 X Variable 0 65535 From this, you ca read the coefficiet values for a ad b as follows: b=itercept= a=x-variable= which is what we expected. The equatio for the lie i this case would be: Y=aX+b=X+ The stadard errors i a ad b are zero here because the poits are o a perfect straight lie. I geeral, this will ot be the case, because experimets are ot perfect, ufortuately. For example, if you were to use the followig data poits (they are the same poits as before, except for the last oe) ad do a liear regressio o them, you will get:

X Y 0 3 5 3 7 4 9 5 5 Table : Not-so-perfect Example Data Coefficiets Stadard Error t Stat Itercept 0.3809538 0.99886557 0.38366 X Variable.574857 0.399444 7.7949 Table 3: Regressio Results of Not-so-perfect Data Ad you ow have stadard errors, which are ot zero. You would quote your results at the 95% cofidece level as: b=itercept=o.38±0.999 a= x -variable=.574±0.399 Of course, you must decide for yourself each time how may decimals are realistic ad what the uit is. Liear regressio is a very useful tool, ad you will eed it frequetly durig this course. I your report, DO NOT iclude the "regressio-summary" Excel produces. Istead, whe you do a liear regressio o your data, all you have to give is the equatio of the lie (icludig errors) Excel calculated, ad state that the calculatio was a liear regressio. Velocity (m/s) Measured 0 4.9.38 4.8.99 4.6. 4.6.5 4.5 3.06 4.4 3.77 4. 4.09 4. 4.65 3.8 5.5 3.6 6. 3.0 7..6 7.88.3 8.53.0 9.79. 0.3 0.8 0.93 0.5. 0..37 0.0 Table 4. Free Fall Data HOW TO PERFORM NONLINEAR TRENDLINE 0.0 0 5 0 5 Velocity (m/s) Besides liear tredlie, Excel has the capability of fittig logarithmic, polyomial of arbitrary order, power or expoetial fuctios to data. For the data preseted i Table 4, it appears that a quadratic 5.0 4.5 4.0 3.5 3.0.5.0.5.0 0.5 Quadradic Tredlie (Show) F = -0.07V - 0.78V + 4.977 R = 0.9986 Figure. Velocity vs. Force Liear Tredlie F = -0.3683V + 4.9 R = 0.96

relatioship should produce a excellet fit. Figure substatiates this i that this quadratic tredlie has a r of 0.9986 as compared to a value of 0.96 for the liear fit whe the itercept value is set to 4.90 (See Curve Fittig.xls example). Higher order polyomials may be used but ay icrease i r that is obtaied by this icreased complexity is rather superficial. HOW TO PERFORM NONLINEAR OPTIMIZING SOLVER If we start over o this problem ad apply some basic dyamics to the free fall problem, the summatio of forces i this case must be equal to the gravitatioal body force (m-g) i the dowward directio plus a drag force i the upward directio that is some ukow fuctio of velocity. Therefore theory implies that the force versus velocity relatioship must have the followig geeral form: F ( V ) = mg Drag () but it does ot supply ay iformatio about how the drag varies with velocity. Our ow persoal experiece idicates that the drag force icreases with velocity ad extesive experimetal testig over the years has show that power laws ca be used frequetly to correlate velocity-drag data over limited F = mg av b () velocity rages. If this is assumed to be the case here, the Theory ad some empirical isight has therefore bee combied to obtai a possible fuctio form betwee velocity ad force i terms of two arbitrary costats (a, b) that is based upo the physics of pheomea ad ot just blid curve fittig as was doe i the liear ad quadratic (Figure ) curve fit examples. The values of a ad b that give the best fit with the experimetal data ca be determie through the use of the Excel oliear optimizig solver. The fust requiremet of usig the oliear optimizig solver is the developmet of a regressio fuctio that you what to optimize i terms of miimizig or maximizig its value or obtaiig a specified value. The tredlies that are preseted i the previous two curve fits are based upo least square regressio i which the followig regressio fuctio is miimized ( F i F i ) i= (3) where F i is the measured force ad F i is the correspodig predicted value i the data set that cotais values. I this case Equatio would be subsituted for F i (F i =mg-av b ). Istead of doig this, lets miimize r ( F i F ) i = i= (4) ( Fi Fi ) i= r. That is where F is the mea force of the experimetal data set. Excel provides a oliear optimizig solver for miimizig fuctios such as Equatio 4. However, the problem must be prepared properly to obtai a

appropriate solutio. Table 5 presets a copy of the spreadsheet (see file Curve Fittig.xls for the actual spreadsheet) that was used to determie a & b. This table cotais six colums: colum is the idepedet variable (velocity); colum is the measured variable (acceleratio a i ); colum 3 is the depedet variable (force F i ) calculated from the measured variable, a i ; colum 4 is the predicted depedet variable (the force calculated from Equatio,F i ); colum 5 is the square of the differece betwee colums 3 ad 4; ad colum 6 is the square of the differece betwee colum 3 ad the average force which is calculated at the ed of colum 3. The colums 5 & 6 are the summed ad these values are used to calculate the r value for a guess set of coefficiets (a, b). For istace, the guess of (,) produces a very poor r value of -5.88. Velocity (m/s) a = 0.08535 N/(m/s)^b g = 9.8 (m/s^) b =.6635 m = 0.5 (kg) Accel. (m/s^) Measured Fi Predicted* Fi (Fi - Fi)^ N^ Table 5. Excel Table Used to Perform Noliear Regressio (Fav - Fi)^ N^ 0 9.8 4.9 4.9 0.00E+00 3.93.38 9.6 4.8 4.8.08E-03 3.54.99 9. 4.6 4.6.04E-03.83. 9. 4.6 4.6.3E-03.66.5 9.0 4.5 4.5 3.75E-05.50 3.06 8.8 4.4 4.4.7E-03.0 3.77 8.3 4. 4. 6.6E-04.5 4.09 8. 4. 4.0.39E-03.8 4.65 7.5 3.8 3.8.67E-03 0.69 5.5 7. 3.6 3.4.4E-0 0.40 6. 6.0 3.0 3..5E-0 0.0 7. 5..6.6.79E-04 0.0 7.88 4.5.3.3 8.3E-05 0.45 8.53 3.9.0.9 3.97E-03 0.94 9.79... 3.7E-03 3.49 0.3.5 0.8 0.8 4.8E-04 4.70 0.93 0.9 0.5 0.3.0E-0 6.09. 0.3 0. 0..34E-05 7.66.37 0.0 0.0 0.0.64E-03 8.5 Fav =.9 Sum = 5.80E-0 53.50 R^ = 0.998964 = - SUM(Fi - Fi)^/SUM(Fav - Fi)^ * Fi = Force(m,g,V,a,b) see Module Force(m,g,V,a,b) Excel uses a iterative approach to solve the oliear regressio problem oce it has a iitial guess set to start this iterative process. I this case, the program will systematically vary aad b to determie the local gradiet of' r ad thereby determie how the (a, b) set should be varied to maximize r. I order to use the solver tool, the tool must be loaded ito Excel. The solver ca be loaded by: () Click o Tools i the mai meu bar () Click o Solver i the pull dow meu If Solver is ot a optio, the (a) Click o Add-Is i the pull dow meu (b) click o Solver Add-I i the Add-Is dialog box (the check box must be checked) (c) Click OK (d) Click Solver The Solver dialog box is ow visible. The first meu item is the target cell which is r i this case. The secod item delieates what actio is to be perform o the target cell. I this example we wish to miimize

the target cell. The third item specifies which cells may have their values varied to accomplish the objective which i this case are cells cotaiig the guess values of the regressio parameters a ad b. Note that amed cells ca be utilized i specifyig the cell locatios of the target cell ad the adjustable cells. As a optio, you ca set umerical costraits o the adjustable cells. A little thought about the physics of this problem idicates that both a ad b are positive ad these costraits may be added. I some problems you may wish to chage the default Precisio ad Tolerace values by first clickig the Optios butto. Now click OK, ad Excel will attempt to fid the optimum solutio ad replace the guess values of the regressio parameters with the optimum values. Table 5 idicates that the combied theoretical/empirical correlatio produces a r of 0.9989 which is slightly better tha the quadratic. This correlatio is also simpler tha the quadratic fit ad it is more physically sigificat. Istead of basig the curve fit o r, try usig the least squares regressio method to compute the coefficiets ad compare your results. This example also illustrates the use of a fuctio module. To see it, click Tools, Macro ad Visual Basic Editor. Oe word of cautio: oliear fuctios ofte cotai more tha oe solutio ad that a give guess set may produce a local solutio (i this case, a local miimum) istead of a global solutio. Highly oliear problems may also require a fairly accurate iitial guess to obtai a global solutio or ay solutio. You may have to resort to plots to produce a accurate iitial guess. See Noliear Regressio.xls for aother example. Referece F = 4.9 0.085V.663 (5) Physics 0: Egieerig Physics. Lab Maual, Appedix A, Uiversity of Wyomig Physics ad Astroomy, Sprig, 999.