Ordinary Least Squares (OLS): Simple Linear Regression (SLR) Assessment: Goodness of Fit & Precision

Size: px

Start display at page:

Download "Ordinary Least Squares (OLS): Simple Linear Regression (SLR) Assessment: Goodness of Fit & Precision"

Damon Boone
5 years ago
Views:

Ordnary Least Squares (OLS): Smple Lnear Regresson (SLR) Assessment: Goodness of Ft & Precson How close? Goodness of Ft and Inference Brng on the ANOVA Table!

1 Ordnary Least Squares (OLS): Smple Lnear Regresson (SLR) Assessment: Goodness of Ft & Precson How close? Goodness of Ft and Inference Brng on the ANOVA Table! (SST, SSE and ) Goodness of Ft I: Mean Squared Error (MSE) and Root MSE (RMSE) Goodness of Ft II: R-squared Assessment of Goodness of Ft Examples n Excel and Stata Comparng SLR Models Usng Goodness of Ft Metrcs Inference/Precson: Standard Errors and t Stats How close? Goodness of Ft v Inference/Precson After we have derved the OLS parameter estmates, 0 ˆβ and ˆβ, the queston always arses: How well dd we do? How close are the estmated coeffcents to the true parameters, β 0 and β? We'll have several answers None wll be entrely satsfactory though they wll be nformatve, nonetheless Qualty of the Overall Model Goodness of Ft: Goodness of ft metrcs tell us somethng about the qualty of the overall model, about how well the predcteds ft the actuals They may not tell us as much as we'd lke to know about how precsely we've estmated the true parameters But f we have a lot of data and the goodness of ft metrcs look good, then we should feel pretty good about our estmated coeffcents, even though there s always some probablty that they are way off a GOF Metrc I: MSE (Mean Squared Error) MSE s almost sort of lke an average squared resdual I say almost sort of lke because nstead of takng an average (and dvdng by n), we dvde the sum of the squared resduals by n- (The choce reflects an nterest n unbasedness) b GOF Metrc II: R (Coeffcent of Determnaton) There are two equvalent ways to thnk about R One nterpretaton s that t measures the proporton of the varaton n the y's (the actuals) explaned by the ys ˆ ' (the predcteds) Alternatvely, t captures the magntude of the correlaton between the y's and the ys, ˆ ' the actuals and the predcteds It wll turn out that 0 R, and so f we have R close to we say that goodness of ft s hgh, and f t's close to 0, goodness of ft s low In contrast, t won t always be so obvous whether the MSE's are large or small n magntude

2 SLR Assessment: Goodness of Ft & Inference v4 3 Qualty of the Indvdual Parameter Estmates Precson/Inference: At the end of ths secton, we wll brefly dscuss the concepts of standard errors and t stats, and show how they can be used to say somethng about precson of estmaton of the unknown parameters A more formal treatment of ths approach wll have to wat, but there are some useful rules of thumb for assessng precson, whch we can ntroduce at ths juncture Later we wll make some dstrbutonal assumptons whch wll enable us to do nference construct Confdence Intervals and run Hypothess Tests Whle those nferental tools won t wth certanty answer the queston How Close?, they wll gve us probablstc assessments as to how close our estmated coeffcents are to the true unknown parameter values Those probablstc assessments wll nvolve levels of confdence for confdence ntervals and sgnfcance levels for hypothess testng Stay tuned! 4 But before turnng to R we frst need to ntroduce some ANOVA (Analyss of Varance) termnology and results Brng on the ANOVA (SST, SSE and ) 5 Some defntons whch wll be useful n assessng the MSE/RMSE and metrcs: a SST: Total Sum of Squares ( y R goodness of ft Ths the sum squared devatons of the actual values of the dependent varable from ther mean SST Snce Syy = ( y =, SST = ( n ) S yy ( n ) tmes the varance of n n the actuals b SSE: Explaned Sum of Squares = ( yˆ Ths s the sum squared devatons of the predcted values of the dependent varable from the mean of the actual values If there s a constant term n the model, the mean of the actuals s also the mean of the predcteds, y = yˆ In ths most common case: SSE = ( yˆ ) ( ˆ ˆ y = y and so SSE = ( yˆ = ( n ) S yy ˆˆ ( n ) tmes the varance of the predcteds c : Resdual Sum of Squares uˆ = ( y yˆ ) Ths s the sum squared resduals, the squared dfferences between the actual and predcted values of the dependent varable Everyone doesn t always use the same termnology for these concepts In Stata regresson output, SST s SS Total, s SS Resdual, and SSE s SS Model And some authors flp the defntons of SSE and

SLR Assessment: Goodness of Ft & Inference v4 Snce u ˆ = 0, the resduals by constructon have mean 0, and so Suu ˆˆ = or n put dfferently, = ( n ) S uu ˆˆ ( n ) tmes the varance of the resduals 6 To

3 SLR Assessment: Goodness of Ft & Inference v4 Snce u ˆ = 0, the resduals by constructon have mean 0, and so Suu ˆˆ = or n put dfferently, = ( n ) S uu ˆˆ ( n ) tmes the varance of the resduals 6 To summarze: yy, (n-) tmes the varance of the actuals ( ˆ ) ( ) yy ˆˆ, (n-) tmes the varance of the predcteds ˆ ( ˆ ) ( ) uu ˆˆ, (n-) tmes the varance of the resduals SST = ( y = ( n ) S SSE = y y = n S = u = y y = n S 7 Result: SST = SSE +, f there s a constant term n the model a Or dvdng through by (n-), we have SST = SSE + n n n, or Syy = Syy ˆˆ + Suu ˆˆ b In words: The sample varance of the actuals s the sum of the sample varances of the predcteds and of the resduals c What drves ths result? Snce we have a constant term n the regresson, the mean of the predcted values s the same as the means of the actuals or put dfferently: ŷ = y 8 You shouldn t be too surprsed by ths result Earler we showed that OLS effectvely decomposed the y's nto two uncorrelated parts, predcted and resduals And snce y ˆ ˆ = y + u and ˆ ρ yu ˆˆ = 0, the sample varance of the actuals wll be the sum of the sample varances of the predcteds and of the resduals whch s exactly the result above, S = S + S So perhaps you saw ths comng yy yy ˆˆ uu ˆˆ 9 Ths result does not necessarly hold f there s no constant (ntercept) term n the model But do not fear! There are lots of good reasons for ncludng a constant term n your model In Proof: The trck s to add and subtract y ˆ nsde the expresson and to then smplfy: ( y ) ( ˆ ˆ y = y y + y = ( y ˆ ) ( ˆ ) ( ˆ )( ˆ y + y y + y y y = + SSE + ( y ˆ )( ˆ y y So we just need to prove that ( y ˆ )( ˆ y y = uˆ ( yˆ = 0, the sample covarance of the predcted values and the resduals s zero But snce y y ˆ β ˆ β x ( ˆ β ˆ β x) ˆ β ( x x) and ths s ˆ = =, uˆ ˆ ˆ ˆ ( y = β u ( x x) 0, snce u ˆ ( x x ) = 0 3

4 SLR Assessment: Goodness of Ft & Inference v4 fact, general practce s to always nclude a constant term n your model unless you have a specfc reason not to do so Goodness of Ft I: Mean Squared Error (MSE/RMSE) 0 MSE provdes one measure of how close your predcted values are to the actuals: a MSE = 3 measured n squared unts of the dependent varable n To put the metrc n the same unts as the y's, we take the square root of the MSE ths gves us Root Mean Squared Error (RMSE), whch s sometmes called the standard error of the regresson Ths metrc s sort of lke an average devaton of predcted from actuals but not qute, gve the specfcs of the calculaton and for reason prevously dscussed a RMSE = MSE = n measured n unts of the dependent varable Sometmes we also look at, Mean Absolute Error (MAE), a goodness of ft metrc closely related to RMSE: a MAE = y ˆ y, where y ˆ y s the absolute value of the th resdual n b MAE's are not typcally ncluded n standard regresson package results but they can usually be easly obtaned 3 One of the challenges n workng wth MSEs, RMSEs and MAEs s nterpretng magntudes On ther face, t's not obvous whether these metrcs are small or large n magntude So you'll need to brng other nformaton to bear n formng an opnon as to how well your model has ft the data 4 Our alternatve metrc, the Coeffcent of Determnaton ( R ), provdes more readly nterpreted results Goodness of Ft II: R-squared 5 Our second goodness of ft metrc, the Coeffcent of Determnaton, s defned by: R = SST a So long as there s a constant term n the model (so the mean predcted value s the same as the mean actual value), = SST SSE, and so R SSE ( yˆ = = = SST SST ( y 3 As you'll see later n the semester, we dvde by (n-) rather than n to acheve unbasedness 4

5 SLR Assessment: Goodness of Ft & Inference v4 b Then R SSE ( yˆ / ( n ) Sample Var( predcted) S = = = = SST ( y ) / ( ) SampleVar( actual) S y n c By constructon, 0 R (f there s a constant term n the model) hgher values mean that you've done a better job explanng the varaton n the actuals Don t get too excted f R s close to, or too depressed f t s close to 0 Dong good econometrcs s way more than just maxmzng R If your model does not have a constant term then ths last formula need not be the case Further: If your model does not have a constant/ntercept term then you should not pay too much, f any, attenton to R 6 Interpretaton I: Rato of Varances Gven the results above, R-squared s the rato of the Sample Varance of the predcted to the Sample Varance of the actuals the percent of the varaton of the actuals explaned by the model 7 Interpretaton II: Correlaton between predcted and actuals R s also the square of the sample correlaton between the ndependent and dependent varables, as well as the sample correlaton between the actuals and predcteds: ρ = ρ = xy yy ˆ R a Ths s an mportant result so here's a quck proof: We know that ˆ Sxy Sy β = = ρxy S S xx x Snce y ˆ ˆ ˆ = β0 + βx, the sample varance of the predcted values wll be defned by: S ( ˆ ˆ ˆ ˆ β + β x β β x) Sxx n 0 0 ˆ yy ˆˆ = = β But then R ˆ S yy ˆˆ β S S xx y S S xx yy Sxx = = = ρxy = ρxy = ρxy Syy Syy Sx Syy Sxx Syy or put dfferently: Snce SSE = ρ SST (see followng), xy yy ˆˆ yy SSE ρ xy = = R SST ( ) ( ˆ ˆ ˆ ˆ ) ˆ ˆ β0 β ( β0 β ) β ( ) And so SSE = y y = + x + x = x x S SST SSE = ρ x x = ρ x x = ρ SST y xy ( ) xy ( ) xy Sx ( x x) 5

SLR Assessment: Goodness of Ft & Inference v4 v And snce ρ xy = ρ (the correlaton of the x's and y's s the same as the correlaton yy ˆ between the predcted and the actuals), we have the desred

6 SLR Assessment: Goodness of Ft & Inference v4 v And snce ρ xy = ρ (the correlaton of the x's and y's s the same as the correlaton yy ˆ between the predcted and the actuals), we have the desred result: R = ρ = ρ xy yy ˆ b When we move to MLR models, wth multple explanatory varables, we lose the connecton between R and ρ xy but the connecton to the correlaton between predcted and actuals wll carry forward ( ρ = R for MLR models as well) ŷy Assessment of Goodness of Ft 8 The two goodness of ft metrcs (R-squared and MSE/RMSE) tell you somethng about how well your model captures/explans the varaton n the dependent varable y They alone, however, do not tell you how well you ve estmated the unknown parameter values β0 and β In some cases, R-squared wll be hgh and MSE/RMSE wll be low, and your parameter estmates wll be qute poor and vce-versa a Example: Suppose you have a sample of sze two Wth just two data ponts, R = and MSE = 0 and you have n all lkelhood come up wth mserable estmates of the unknown parameter values 9 Here are a couple examples, wth just fve observatons randomly generated usng a true relatonshp gven by the sold red lne and the dashed black lne shows you the OLS estmated SLR relatonshp for the gven dataset In both cases, the R s above5, and the estmated relatonshp s all wrong So n matters too! 0 nobs Matters Too! We wll see later that the qualty of the parameter estmates depends on R-squared (or MSE/RMSE) and the number of observatons n the dataset And so f R- squared s hgh and MSE/RMSE s low, and you have lots of data, then you have probably done a pretty good job estmatng the unknown parameter values nobs Matters Too! 6

SLR Assessment: Goodness of Ft & Inference v4 Examples n Excel and Stata Contnung wth the bodyfat example n Excel Generate the predcted, y ˆ ˆ ˆ = β0 + βx, and resduals, uˆ ˆ ( ˆ ˆ = y y = y β0 + βx)

7 SLR Assessment: Goodness of Ft & Inference v4 Examples n Excel and Stata Contnung wth the bodyfat example n Excel Generate the predcted, y ˆ ˆ ˆ = β0 + βx, and resduals, uˆ ˆ ( ˆ ˆ = y y = y β0 + βx) Generate s by squarng the resduals and summng those (use the SUMSQ() functon to save a step) Use the COUNT() functon to count your observatons, and generate MSE = and n RMSE = MSE To generate SSEs, demean the predcteds, and compute the sum squared of those, agan usng SUMSQ() And use SUMSQ() to compute SST usng the demeaned Brozek observatons Once you have all of these, you can verfy that + SSE = SST And wth SSE and SST, dvde by n- to generate the sample varances of the explaned and the actuals You can now compute 3 4 R =, SST SSE R =, SST R four ways: Sample Var( predcted) S yy ˆˆ R = SampleVar( actual) = S, and R = ρ = ρ xy yy ˆ Here s what your results mght look lke: yy I have posted bodyfat example xlsx to llustrate 7

8 SLR Assessment: Goodness of Ft & Inference v4 Runnng the Regresson n Excel When you run the regresson n Excel, you ll get the followng SUMMARY OUTPUT Regresson Statstcs Multple R 0636 R Square Adjusted R Square Standard Error 635 Observatons 5 ANOVA df SS MS F Sgnfcance F Regresson 5,669 5, E-7 Resdual 50 9, Total 5 5,0790 Coeffcents Standard Error t Stat P-value Lower 95% Upper 95% Intercept (9995) 389 (48) 39776E-05 (47004) (5899) wgt E You can fnd SSE,, SST, MSE, RMSE and R-squared n there you just need to know where to look The SS s are all n the SS column, wth Regresson for SSE, Resdual for, and Total for SST MSE can be found n the MS column, row Resdual R squared s reported under Regresson Statstcs, and what Excel calls the Standard Error of the regresson, we call RMSE So the statstcs are all there you just need to know where to look Runnng the Regresson n Stata reg brozek wgt Source SS df MS Number of obs = F(, 50) = 506 Model Prob > F = Resdual R-squared = Adj R-squared = Total Root MSE = brozek Coef Std Err t P> t [95% Conf Interval] wgt _cons predct bfathat (opton xb assumed 8

9 SLR Assessment: Goodness of Ft & Inference v4 Agan, you can fnd SSE,, SST, MSE, RMSE and R-squared n there you just need to know where to look The SS s are agan n column SS, but Stata now puts the SSEs n the Model row MSE s are agan n column MS and row Resdual And R-squared and Root MSE (RMSE) are n the regresson stats n the upper rght corner We agan fnd that + SSE = SST: d and R-squared s ndeed those correlatons squared: corr Brozek bfathat wgt (obs=5) Brozek bfathat wgt Brozek 0000 bfathat wgt d 63^ Comparng SLR Models Usng Goodness of Ft Metrcs For the appled econometrcan, the journey s as mportant as the fnal destnaton And there's plenty of scence and art along the way Each regresson analyss tells you somethng and leads to the next analyss Ultmately, you typcally converge on your preferred model but there was plenty of learnng along the way And that learnng defntely nformed your analyss As part of the learnng process, econometrcans are always comparng results across models, and makng decsons about how to move forward We'll have a lot more to say about that later, but gven that we are n the mdst of Goodness of Ft metrcs, why not say a few words about how to use those metrcs to compare models? You can use R and MSE/RMSE to compare the performance of dfferent SLR models but only to a lmted extent And you must be careful! If the dfferent models all have the same LHS data (so the y's are the same n the dfferent models both n terms of number and n terms of values), then the SSTs and Syy ' s wll be the same across the models, and you can compare R 's and MSE/RMSE's Under these condtons the R ' s and the MSE/RMSE's wll move n opposte drectons, snce: R > R > < < MSE < MSE SST SST n n 9

10 SLR Assessment: Goodness of Ft & Inference v4 So under these condtons, models wth hgher R 's (and lower MSE/RMSE's) do a better job of fttng the data, and n that sense are preferable But: If the y's are not the same across the dfferent models, then R 's and MSE/RMSE's are not drectly comparable and won t tell you much unless you make some adjustments Here are some examples usng the bodyfat dataset Example : Predctng Brozek wth four dfferent SLR Models Here are the results from four SLR models, where Brozek s the common LHS varable and hgt, wgt, abd, and BMI are the canddate RHS varables () () (3) (4) Brozek Brozek Brozek Brozek hgt -089 (-4) wgt 06*** (7) abd 0585*** (3) BMI 547*** (679) _cons 37*** -9995*** -350*** -04*** (344) (-48) (-49) (-86) N R-sq rss rmse t statstcs n parentheses * p<005, ** p<00, *** p<000 The syntax for the esttab output was: esttab, r scalar(rss rmse) compress The optons n the esttab statement: r: dsplays R rss: dsplays s rmse: dsplays RMSE and compress compresses the output so t s not as wde and fts better on the page 0

11 SLR Assessment: Goodness of Ft & Inference v4 Notce that R ncreases as you go from hgt (0008), to wgt (0376), to abd (066) and then decreases wth BMI (0530) And as advertsed, RMSE moves n exactly the opposte drecton Lookng across the four models, abd (wast sze) has most explanatory power (hghest R 's and lowest MSE/RMSE's), BMI s n second place, wgt s a bt behnd BMI and hgt trals the feld by a hefty margn Example : Takng ln's and mxng and matchng In ths example take ln's of Brozek and abd and run four models, mxng and matchng In Models () and () Brozek s frst regressed on abd, and then on lnabd; n Models (3) and (4) ths s repeated wth lnbrozek now the dependent varable Here are the results: esttab, r scalar(rss rmse) compress () () (3) (4) Brozek Brozek lnbrozek lnbrozek abd 0585*** 00337*** (3) (785) lnabd 56*** 399*** (83) (899) _cons -350*** -348*** *** (-49) (-) (-59) (-536) N R-sq rss rmse t statstcs n parentheses * p<005, ** p<00, *** p<000 (Note that 's have been manually added to the table you'll learn why below) It s temptng to say that Model () s the best because t has the hghest R or maybe you thnk that Model (4) s the best because t has the lowest MSE/RMSE Perhaps the dfferent recommendatons should be your frst clue that R 's and MSE/RMSE's mght not under these crcumstances tell you the best model The R 's and MSE/RMSE's n Models () and () are comparable to one another (snce they have the same LHS varable) and the R 's and MSE/RMSE's n Models (3) and (4) are also comparable to one another (they also have the same LHS varable) But you cannot, wthout addtonal computatons, compare the frst two Models to the last two Models, because they have dfferent LHS varables

SLR Assessment: Goodness of Ft & Inference v4 So Model () performs better than (), and (4) does better than (3) but don t you dare try to compare () and (4) wthout addtonal computatons And besdes, f

12 SLR Assessment: Goodness of Ft & Inference v4 So Model () performs better than (), and (4) does better than (3) but don t you dare try to compare () and (4) wthout addtonal computatons And besdes, f you tred to do that, you'd pck () on the bass of hgher R 's or maybe (4) on the bass of lower MSE/RMSE's Comparablty across models wth dfferng LHS varables s clearly an ssue Now you see why the 's are n the results table? Inference/Precson: Standard Errors and t Stats As mentoned above, more formal use of the standard error to assess precson of parameter estmaton awats the dscusson of nference, confdence ntervals and hypothess testng However there are some useful rules of thumb for assessng precson usng standard errors We now turn to those Standard Errors: Standard errors (se's)provde us wth a measure of precson n the estmaton of the unknown parameters Knowng the se alone however s typcally not very helpful, snce t s often dffcult to know whether as partcular standard error s small or large We wll crcumvent ths shortcomng by creatng a t stat, whch effectvely standardzes the standard error, and gves us a metrc that s more readly nterpretable Before turnng to the t stat, let's frst defne the standard error In OLS/SLR models the standard error assocated wth the slope estmate s defned by: 4 RMSE RMSE se = ˆ β ( x x) = x S n Perhaps not surprsngly, the standard error s: ncreasng n RMSE (reported standard errors wll be smaller wth models that do a better job of fttng the data), decreasng n n (more observatons wll lead to smaller reported standard errors), and decreasng n the varance of x (ths s perhaps less ntutve, but ncreased varance n your RHS varable s a good thng and wll lead to a smaller reported standard errors) 4 The proof of ths s very straghtforward and wll come later We almost never pay attenton to the standard error of the constant term

13 SLR Assessment: Goodness of Ft & Inference v4 t-stat: Comparng the standard error to the estmated coeffcent, ˆβ, often tells us somethng about how relably we've estmated the unknown slope parameter, β Before assessng relablty, though, we'll need to defne one more term, the t stat: t ˆ β ˆ = β se ˆ β The absolute value of t stat tells you the magntude of the estmated slope coeffcent, ˆβ, measured n unts of standard errors Once you know the t stat, you can apply some general rules of thumb to assess precson of estmaton In general, the larger the t stat, the greater the lkely precson (as you'll see later, n also matters n assessng precson) so you should take comfort seeng hgh t stats, and fret over low ones In terms of ranges and emotons, and assumng a szable n: f t > or so then you have lkely done a pretty good job of estmatng the unknown slope parameter, β, f t < sh then you have lkely done a not so good job of estmatng β, and and for n-between magntudes of t whle the results aren't as strong as you mght lke, there's hope and reason to beleve that wth further work your model wll be somethng to brag about So defntely no reason to lose hope! Connecton between t and the R There's a connecton between the measure of precson, R measure of goodness of ft, as well as SSE and : R SSE t = ˆ ( n ) ( n ) β R = Who knew that the Goodness-of-Ft and precson metrcs were connected? ˆ β ˆ β ( x ˆ x) β ( x x) Proof: t ˆ = = = β se MSE / ( n ) ˆ SSE = ˆ β ( x x) β We know from the proof of SSE SSE / SST R And so t = ( n ) ( n ) ( n ) = β / SST = R ˆ xy t ˆ β, and ρ = R that These equatons make t clear that precson n estmaton s a functon of both R, how well the model fts the data, as well and the number of observatons, n It may not be so obvous, but ths expresson s ncreasng n n and R And so deally, both n and R are large 3

14 SLR Assessment: Goodness of Ft & Inference v4 Note that snce SSE + = SST, the t stat wll depend on how the SSTs are dvded between SSEs and s, snce t ˆ β wll be proportonal to SSE, for gven n The hgher the SSE/ rato, the greater the magntude of the t stat Example: Model () from above reg Brozek abd Source SS df MS Number of obs = F(, 50) = Model Prob > F = Resdual R-squared = Adj R-squared = Total Root MSE = Brozek Coef Std Err t P> t [95% Conf Interval] abd _cons The reported t stat for the abd varable s 3 Applyng the formulas, we have: R 66 t = ˆ ( n ) β R = 338 = and so t ˆ β = = 8 SSE t = ˆ ( n ) β = = and so t ˆ β = = Importance of n and R : If you have hgh R but low n, or hgh n (lots of observatons) but poor ft (low R ), then t's lkely that your slope estmate s not so precse But a healthy R together wth lots of observatons means that that you have lkely done a nce job estmatng the unknown parameter, β So: low n and low (low n and hgh hgh n and hgh R : bad news get back to work R ) or (hgh n and low R : well done! R ) stll not so great 4

Ordinary Least Squares (OLS): Simple Linear Regression (SLR) Assessment: Goodness of Fit & Precision

Ordinary Least Squares (OLS): Simple Linear Regression (SLR) Assessment: Goodness of Fit & Precision Ordnary Least quares (OL): mple Lnear Regresson (LR) Assessment: Goodness of Ft & Precson How close? Goodness of Ft v. Inference Goodness of Ft I: Mean quared Error (ME) and Root ME (RME) Brng on the ANOVA