Ordinary Least Squares (OLS): Multiple Linear Regression (MLR) Assessment I What s New? & Goodness-of-Fit

Size: px

Start display at page:

Download "Ordinary Least Squares (OLS): Multiple Linear Regression (MLR) Assessment I What s New? & Goodness-of-Fit"

Curtis Marsh
5 years ago
Views:

1 Ordinary Least Squares (OLS): Multiple Linear egression (ML) Assessment I What s New? & Goodness-of-Fit Introduction OLS: A Quick Comparison of SL and ML Assessment Not much that's new! ML Goodness-of-Fit: Adjusted -squared and MSE (MSE) Comparing ML Models I: Using Goodness of Fit Metrics Introduction When we looked earlier at assessment in the context of SL models we focused on two types of measures of performance, goodness-of-fit and precision/inference metrics. The goodness of fit metrics told you something about how well the predicteds from your model fit the actuals, and the precision/inference metrics said something about how precisely the slope parameter was estimated: SL: Goodness-of-Fit Mean Squared Error (MSE): SS MSE = an average squared residual, sort of n oot Mean Squared Error (MSE): MSE = MSE sort of an average residual, but more like a square root of an average squared residual, sort of Coefficient of Determination ( ) : SS SSE S = = = = = proportion ˆˆ ρxy ρ ˆ SST SST S of the variance of the actuals explained by the predicted, as well as the correlation (squared) between the predicteds and the actuals. SL: Precision/Inference se β : Standard Error ( ˆ ) t β : t stat ( ˆ ) MSE MSE se = ˆ β ( x x) = Sx n ˆ ˆ β i ˆ β SSE t = = ( n ) = ( n ) β se SS The MSE, MSE and Standard Error metrics are not in standardized units, making it difficult to interpret the magnitudes. But and t ˆ β are both standardized to some extent, making them perhaps more useful in assessing the performance of the model:

move to ML models, with the formulas changing by not very much, as you'll see below.

2 : 0 closer to one is better. closer to zero, not so much t ˆ β : above ish: nice precision; less than ish: not so nice; and between and : there's hope moving to ML models: We continue to turn to these assessment metrics when we move to ML models, with the formulas changing by not very much, as you'll see below. Of the metrics, however, proves to be far less useful when assessing performance of ML models and so we address that shortcoming with a new Goodness-of-Fit metric, adjusted (sometimes adj, or ). Some Intuition: The shortcoming of in the ML world: When additional explanatory variables are added to a ML model, SSs will typically decrease, or at worst, stay the same but SSs can never increase (and can never decrease) with additional explanatory variables in the model. And so in the context of the metric, HS variables get credit for just showing up irrespective of their explanatory power. Consider a ML model with min SSs of SS 0. If you have an additional HS variable, then one option in minimizing SSs is to keep the coefficient of that new variable equal to zero. But when you minimize SSs with that restriction, you are solving the previous min SSs problem, and so the minimum SS (restricting the new coefficient to be zero) is SS 0. So with the additional explanatory variable, you can never do worse in minimizing SSs than SS, and you can probably do better once you drop the restriction of that zero coefficient. 0 If it turns out that the when minimizing SSs for the new model, the new variable has a coefficient of zero, then SSs will remain at SS 0 and the new variable has added nothing (no explanatory content) to the model. is unchanged. Alternatively, if the new coefficient is non-zero when minimizing SSs, then SSs will necessarily have decreased (so long as the new variable is not perfectly collinear with the other HS variables in the model). increases. When new explanatory variables are added to a model their coefficients will typically be non-zero and will typically increase. So don t be impressed if increases when new HS variables are added to the ML analysis that's entirely to be expected. Certainly McKayla Maroney is not impressed! Assuming no changes to the dependent variable.

3 Here's an application, which illustrates the point and tests your understanding: ML Application: Correlations provide a lower bound on ML. Suppose you are considering a ML box office revenues analysis, with explanatory variables wk, wk and wk3. Here are the correlations of the variables in the model:. corr rtotgross wk wk wk3 (obs=7,730) rtotgr~s wk wk wk rtotgross.0000 wk wk wk Notice that the largest correlation between a HS variable and rtotgross is (wk3). Then as shown below, the in the full model must be no less than.9474 =.898, and most likely will be greater. And so the correlations (squared) provide a lower bound on the ML model. Or put differently: you can often get a pretty good sense of in a ML model just by looking at the correlations (squared) amongst the variables. If you understand the previous comment about HS variable getting credit for just showing up, you'll understand why I claim that the in the full ML model will have an of at least.9474 =.898. Here's why: To get to the ML model, let's start with the SL model in which rtotgross has been regressed on wk3. I pick wk3 because of the three HS variables, it has the highest correlation with rtotgross. Since = ρ xy for SL models, we know that the for this SL model will be.9474 =.898. See Model () below. and build to Model (3): () () (3) rtotgross rtotgross rtotgross wk3 7.75*** 5.47*** 4.778*** (60.3) (.38) (59.84) wk 0.735*** 0.540*** (46.60) (.36) wk 0.745*** (9.79) _cons * -0.60** (0.5) (-.36) (-.64) N sq SS

4 As predicted, ' s are increasing moving left to right, since the coefficients for the new variables are non-zero: increases from.898 in Model () to.90 in Model (), and to.9 in Model (3). And also as predicted, SSs are decreasing. And so we can use simple pairwise correlations together with the fact that will never decrease when additional HS variable are added to a model, to place a lower bound on for the final ML model or put differently, the simple correlations alone tell you that the final ML model will have a very high close to. The following table compares the various Goodness-of-Fit concepts/definitions/formulas in SL and ML models. (Note that I assume that there is always a constant term in the SL and ML models.) OLS: A Quick Comparison of SL and ML Assessment Goodness of Fit SL ML Sum Squares SST = SSE + SS SST = SSE + SS (Coefficient of Determination) (w/ intercept term) SS SSE = = SST SST SS SSE = = SST SST SampleVar( predicted) = SampleVar( actual) SampleVar( predicted) = SampleVar( actual) Degrees of freedom () = n = n k MSE SS SS MSE = = SS SS MSE = = n n k MSE MSE = MSE MSE = MSE Adjusted SS n MSE = = SST n k S 4

5 As you can see, there are a few differences between SL and ML models, but not many!. The definitions of SS, SSE, SST are the same for SL and ML models as is the definition of, and the fact that SST = SSE + SS (since there is a constant term in the model).. In the MSE calculation, we now divide by n-k-, the degrees of freedom () in the ML model. This reflects an interest in unbiasedness to be discussed later. This is in fact consistent with the SL metric, since there were n- in those models. 3. We have new metric for ML models, Adjusted, discussed in more detail below. As discussed above, this will not give new HS variables goodness-of-fit credit merely for just showing up. New HS variables have to impress (in reducing SSs by more than some trivial amount) for Adjusted to increase. ML Goodness of Fit: Adjusted -squared As discussed above, when you are adding explanatory variables to a ML model (and not changing the y's or number of observations), SSs will always decrease (and -sq will always increase) unless the estimated ML coefficient for the new variable is exactly 0 (or the new variable is perfectly collinear with the other HS variables already in the model). So nobody should be impressed if increases when additional explanatory variables are brought into the analysis. The question should be: By how much did increase? If increased a lot, then you should be impressed; but if it increased by not so much, then maybe you'll want to hold your applause. Adjusted is an attempt to adjust the coefficient of determination for this shortcoming. You ll discover that smallish decreases in SSs will not generate a higher adj ; but larger decreases will and what is small or large will depend in part on how many additional variables were added to the model.. n Adj is often (and rather opaquely) defined as = ( ) n k. ecall that we sometimes refer to n-k- as the (number of) degrees of freedom () in the model. Since = SS, a more easily interpreted expression for Adj is: SST = SST, SS n which looks a lot like the definition of (with a n adjustment). 5

6 ( n ) ( n ) Note that since = >, < for k > 0, with the difference inversely ( n k ) related to k. And so adjusted is always bounded above by. Adjusted can be negative, though this rarely happens in practice.. If you see that, you have a really really really bad model! Time to find a new profession! Interpretation of Adj : It's all about the rates of change of SSs and. We can rewrite the previous expression for Adjusted : ( n ) SS = SST. As you add explanatory variables to the model, only the terms in the square brackets (SS and ) are changing, and both ( SSs and ) are typically declining. And so whether increases or decreases will depend on the relative rates of change of SSs and : SS If the decline in SSs is faster than the decline in, then will decline and increase with the additional explanatory variables. will SS But if the decline in SSs is slower than the decline in, then will increase, and will decrease. So for Adjusted to decrease, it must be the case that SSs are dropping faster than. and MSE (MSE) SS / ( n k ) MSE Since = =, adjusted and MSE will always move in opposite SST / ( n ) S directions when S is fixed. So if you are adding (or subtracting) HS variables to (or from) a ML model (and not impacting S ), you should expect to see and MSE moving in exactly opposite directions. Accordingly, the two goodness-of-fit metrics are effectively redundant in the sense that knowing the movements patterns of one tells you the movements of the other. An important difference however is that while we don t necessarily have a good sense of when MSEs (MSEs) are small or large, we do know that, and so we typically have an easier time evaluating magnitudes of. Note however that since and do not necessarily move in the same direction, MSEs and will not necessarily move in opposite directions. That was not the case for SL models. 6

7 Comparing ML Models I: Using Goodness of Fit Metrics To illustrate Goodness-of-Fit metrics in action, here s an example using the bodyfat dataset. In Model (), the Brozek measure of bodyfat had been regressed on hgt and wgt.. esttab, r ar scalar (rmse) compress () () (3) (4) Brozek Brozek Brozek Brozek hgt *** (-6.9) (-.43) (-.5) (-.55) wgt 0.87*** -0.0*** -0.08** -0.00* (4.48) (-5.4) (-3.8) (-.5) abd 0.880*** 0.883*** 0.898*** (5.9) (5.3) (.6) hip (-0.49) (-0.58) chest (-0.38) _cons 3.6*** -3.66*** -8.64** -5.86* (4.5) (-5.0) (-.7) (-.0) N sq adj. -sq rmse t statistics in parentheses * p<0.05, ** p<0.0, *** p<0.00 Note the esttab options: r ( ), ar ( ), and rmse ( ) MSE. In Model (), abd has been added to Model (), and -sq and adj. -sq both increase, while MSE declines. In Model (3) hip has been added in, with -sq continuing to increase as it almost always will. Now, however, adj. -sq declines and MSE increases. As always, adj -sq and MSE are moving in opposite directions. And in going to Model (4), with chest added to the model, -sq continues to (slightly) increase, while adj -sq again declines and MSE again increases. 7

Applying this criterion to the previous set of four ML models, Model () is the best performer since it has the highest adj -sq and the lowest MSE.

8 ecall that with SL models, we could use -sq to compare the performance of different models having the same dependent variable. In the ML world, we often use adj 's to compare models, so long as the dependent variables are the same though I'd be the last to suggest that you only look at adj. Applying this criterion to the previous set of four ML models, Model () is the best performer since it has the highest adj -sq and the lowest MSE. But all of the Models tell you something so don't ignore them, just because their performance stats aren't as impressive! Comparing the performance of ML models is as much art as science and in truth, we typically look at a number of different aspects/properties of the model. But certainly adj -sq and MSE are in the conversation. We'll return to this topic later, and focus on the different criteria at play in assessing the performance of the three types of econometrics models discussed in the Introduction: Forecasting models (focus on out-of-sample forecasting, and don t over-fit the data) Behavioral models (the challenging art form) Favorite coefficient models (focus on the favorite coefficient and don t worry about the other aspects of the model other than making sure that you really have included every possible relevant explanatory variable, and accordingly that you have minimized the possibility of omitted variable impact/bias) 8

F Tests and F statistics

F Tests and F statistics Testing Linear estrictions F Stats and F Tests F Distributions F stats (w/ ) F Stats and tstat s eported F Stat's in OLS Output Example I: Bodyfat Babies and Bathwater F Stats,