1 Multiple Regression Analysis: Estimation Simple linear regression model: an intercept and one explanatory variable (regressor) Y i = β 0 + β 1 X i + u i, i = 1,2,, n Multiple linear regression model: an intercept and many explanatory variables (regressors) Y i = β 0 + β 1 X 1i +β 2 X 2i + + β k X ki + u i, i = 1,2,, n OLS estimation method is exactly the same as in the simple linear regression model. OLS estimators are derived from minimizing the sum of squared residuals (SSR) with respect to the parameters min SSR = n (Y i β 0 β 1 X 1i β 2 X 2i β k X ki ) 2 β 0,β 1,,β i=1 k The predicted values of Y i and the regression residuals are computed by Y i = β 0 + β 1X 1i +β 2X 2i + + β kx ki u i = Y i Y i and the estimator of the variance of error term is σ u2 = SSR n (k+1) Note that the number of coefficients k+1 is subtracted from the sample size n in the denominator. Interpretation of the marginal effect If all explanatory variables are different, then β j represents the marginal effect of X j, holding all other explanatory variables constant. read_scr: avg reading score math_scr: avg math score; testscr: avg test score (= (read_scr+math_scr)/2 ); str: student teacher ratio (enrl_tot/teachers); el_pct: percent of English learners; meal_pct: percent qualifying for reduced-price lunch; Holding meal_pct and el_pct constant, one unit decrease in str increases testscr by 0.998, math_scr by 0.811 and read_scr by 1.186. Note that the effect on testscr is the average of the effects on math_scr and read_scr. The effect el_pct is greater on read_scr than on math_scr. In particular, its effect on math-scr is not significant
2. reg testscr str meal_pct el_pct F(3, 416) = 476.31 Model 117811.296 3 39270.4322 Prob > F = 0.0000 Residual 34298.298 416 82.4478318 R-squared = 0.7745 Adj R-squared = 0.7729 Total 152109.594 419 363.030058 Root MSE = 9.0801 testscr Coef. Std. Err. t P> t [95% Conf. Interval] str -.9983097.2387543-4.18 0.000-1.467625 -.5289946 meal_pct -.5473454.0215988-25.34 0.000 -.5898019 -.504889 el_pct -.1215735.0323173-3.76 0.000 -.185099 -.058048 _cons 700.15 4.685686 149.42 0.000 690.9394 709.3605. reg math_scr str meal_pct el_pct F(3, 416) = 302.25 Model 101022.729 3 33674.243 Prob > F = 0.0000 Residual 46347.9982 416 111.413457 R-squared = 0.6855 Adj R-squared = 0.6832 Total 147370.727 419 351.720112 Root MSE = 10.555 math_scr Coef. Std. Err. t P> t [95% Conf. Interval] str -.810676.277543-2.92 0.004-1.356238 -.2651144 meal_pct -.5432493.0251079-21.64 0.000 -.5926034 -.4938952 el_pct -.0412726.0375676-1.10 0.273 -.1151187.0325735 _cons 694.2015 5.446938 127.45 0.000 683.4946 704.9085. reg read_scr str meal_pct el_pct F(3, 416) = 583.27 Model 136874.052 3 45624.6839 Prob > F = 0.0000 Residual 32540.6021 416 78.2226013 R-squared = 0.8079 Adj R-squared = 0.8065 Total 169414.654 419 404.330916 Root MSE = 8.8444 read_scr Coef. Std. Err. t P> t [95% Conf. Interval] str -1.185943.232556-5.10 0.000-1.643075 -.728812 meal_pct -.5514416.0210381-26.21 0.000 -.5927959 -.5100873 el_pct -.2018744.0314783-6.41 0.000 -.2637508 -.1399981 _cons 706.0984 4.564043 154.71 0.000 697.127 715.0699.
3 Models with Interaction Terms Y i = β 0 + β 1 X 1i +β 2 X 2i + β 3 (X 1i X 2i ) + u i The multiplication (X1 X2) is called the interaction term. (a) The marginal effect of X1 is β 1 + β 3 X 2i. A higher value of X 2 increases the marginal effect of X1 if β 3 > 0, and a higher value of X2 decreases the marginal effect of X1 if β 3 < 0. Similar statements can be made for the marginal effect of X2. (b) Example - Effect of class attendance rate on test scores of a final exam score = 2.01 0.0067atndrte 1.63priGPA 0.128ACT + 0.296priGPA 2 + 0.0045ACT 2 + 0.0056 (prigpa atndrte) n=680, R 2 =0.229, R 2 = 0.222 where score is the standardized outcome on a final exam and atndrte is percentage of classes attended. The standardized outcome z is defined as z=(y-y )/σ y. An increase in z by one unit means that y increases from the mean by the standard deviation σ y. The coefficient of the attendance rate is negative (-0.0067), but this does not mean a negative effect of attendance rate on the test scores. The correct marginal effect is -0.0067+0.0056 prigpa. This indicates that, the effect of the attendance rate varies with student's prior GPA. For a few values of prior GPA, the marginal effects of attendance rates are Prior GPA=2: -0.0067+0.0056 2=0.0045 Prior GPA=3: -0.0067+0.0056 3=0.0101 Prior GPA=4: -0.0067+0.0056 4=0.0157 Therefore, for a student whose prior GPA is 3, an increase in the attendance rate by 10% increases the test score above the class mean by 0.101 standard deviation. A Measure of Goodness-of-fit We introduced two ideas about the measure of goodness of fit. The first idea was to compare the SSR without explanatory variables and with explanatory variables Restricted model: Y i = β 0 + u i SSR r = n i=1 (Y i Y ) 2 Unrestricted model: Y i = β 0 + β 1 X 1i +β 2 X 2i + + β k X ki + u i SSR u = [Y i (β 0 + β 1X 1i + β 2X 2i + + β kx ki )] 2 n i=1 R 2 = SSR r SSR u = 1 SSR u SSR r SSR r The SSR r is called the total sum of squares (TSS) and ESS=SSR r-ssr u is called the explained sum of
4 squares (ESS). And TSS=ESS+SSR. The second idea is the sample estimate of squared correlation between observed dependent variable and its predicted variable R 2 = [corr(y i, Y i)] 2 = [cov(y i,y i)] 2 var(y i )var(y i) Statistical Properties of OLS Estimator in Multiple Regression Model If the assumptions we introduced in the single regressor model are satisfied, the OLS estimators of coefficients are unbiased. And σ u2 = SSR n (k+1) is also an unbiased estimator of σ u 2. What are the new issues in the multiple regression model? (a) Interpretation of the marginal effect If all explanatory variables are different, then β j represents the marginal effect of X ij on Y i, holding all other explanatory variables constant. However, if a variable enters the regression equation more than once in a different form such as Y i = β 0 + β 1 X 1i +β 2 X 2 1i + u i then the marginal effect of X 1 is β 1 +2β 2 X 1i. (b) Measure of Goodness-of-fit: Adjusted R 2, R 2 We can use the R 2 as defined before as a measure of goodness of fit. A higher R 2 means the set of regressors explain the variation of the dependent variable better. However, it has a problem when we compare models based on the value of R 2, because it increases as we add more and more explanatory variables even if the additional values do not belong to the equation. This is due to the nature of least squares method. Consider two regression models (1) Y i = β 0 + β 1 X 1i + u i (2) Y i = β 0 + β 1 X 1i + β 2 X 2i + u i Suppose that model (1) is the true model, i.e., X i2 has nothing to do with the dependent variable and does not belong to this equation. However, model (2) will always give a higher R 2 than model (1). This is because model (1) is a restricted model of model (2) with restriction β 2 = 0, and the minimum SSR of restricted model is always higher than the minimum SSR of unrestricted model. If one wishes to develop a model based on the value of R 2, he will have to add all kinds of relevant or irrelevant regressors. To avoid this effect on R 2, we use an adjusted R 2, which is denoted by R 2. This is computed by adjusting different degrees of freedom in SSR u and TSS
5 R 2 = 1 SSR u (n k 1) SSR r n 1 = 1 SSR/(n k 1) TSS/(n 1) As we add more explanatory variables, both SSR and n-k-1 decrease. Hence, unless SSR is reduced by a sufficient amount, SSR/(n-k-1) may increase and thereby reduce R 2. (c) Multicollinearity This term refers to the correlation among explanatory variables. If some of the explanatory variables are perfectly correlated, the OLS estimators are not uniquely defined: there are infinite number of parameter estimates. This case is called the perfect multicollinearity. Consider a case in which X 2i = cx 1i for a constant c. Y i = β 0 + β 1 X 1i + β 2 X 2i + u i = β 0 + β 1 X 1i + β 2 (cx 1i ) + u i = β 0 + (β 1 + cβ 2 )X 1i + u i We can estimate (β 1 + cβ 2 ), but there are infinite number of ways to split it between β 1 and β 2. Similar arguments can be made for a case when more than two variables are involved in linear relationship. The perfect multicollinearity is a rare case in practice, but it may arise by a mistake in model specification. We will see such a case later. If the collinearity is not perfect and yet there is a high correlation among explanatory variables, then the precision of the OLS estimators suffer. That is, the variance of OLS estimators increase in general as the correlation among explanatory variables increases. Effects of Omitted Relevant Variables and Including Irrelevant Variables In practice, we would never know for sure whether our model specification is correct or not. We may have omitted a variable or may have included unnecessary variables. What are the effects of such mistakes on the OLS estimators? Effects of Including Irrelevant Variables True relationship (only God knows): Y i = β 0 + β 1 X 1i + β 2 X 2i + u i Specified model: Y i = β 0 + β 1 X 1i + β 2 X 2i + β 3 X 3i + u i The true value of β 3 is zero. Without knowing that, we estimate them along with other parameters. Effects: OLS estimators of the overspecified model are unbiased, but their precision falls (their variances increase). Effects of Omitting Relevant Variables True relationship: Y i = β 0 + β 1 X 1i + β 2 X 2i + u i Estimation model: Y i = β 0 + β 1 X 1i + u i Effects: OLS estimator of β 1 is biased in general. The direction of the bias depends on the relationship between the included variables and omitted variables, and the sign of the coefficient of omitted variables. To see this, recall that we can write the stochastic relationship between X 2i = ρx 1i + v i, where ρ can be positive, negative, or zero. Substituting this into the true equation, we get
6 Y i = β 0 + β 1 X 1i + β 2 X 2i + u i = β 0 + β 1 X 1i + β 2 (ρx 1i + v i ) + u i = β 0 + (β 1 + β 2 ρ)x 1i + (β 2 v i + u i ) = β 0 + α 1 X 1i + w i The estimate of β 1 in the estimation equation is in fact an estimate of α 1 = (β 1 + β 2 ρ). Now, what is the direction of bias α 1 as the estimator of β 1? If β 2 ρ > 0, a positive bias (α 1 > β 1 ), and if β 2 ρ < 0, a negative bias (α 1 < β 1 ). Thus, the direction of the bias depends on sign of β 2 and ρ. Example: return to education True relationship: ln (wage) = β 0 + β 1 edu + β 2 ability + u i Estimation model: ln(wage) = β 0 + β 1 edu + u i = 0.584 + 0.083 edu + u, n=526, R 2 =0.186 (a) correlation between education and ability? most likely positive (b) sign of β 2? A higher ability means a higher productivity, and hence, the ability will have a positive effect on the wage rate. (c) conclusion: The estimated return to education 0.083 is likely biased upward. That is, 0.083 is an overestimate of true effect of education on log wage.
7. reg math_scr str F(1, 418) = 16.62 Model 5635.63501 1 5635.63501 Prob > F = 0.0001 Residual 141735.092 418 339.079168 R-squared = 0.0382 Adj R-squared = 0.0359 Total 147370.727 419 351.720112 Root MSE = 18.414 math_scr Coef. Std. Err. t P> t [95% Conf. Interval] str -1.938592.4755165-4.08 0.000-2.873294-1.003891 _cons 691.4174 9.382469 73.69 0.000 672.9747 709.8601. reg math_scr str el_pct F(2, 417) = 103.43 Model 48865.2706 2 24432.6353 Prob > F = 0.0000 Residual 98505.4565 417 236.224116 R-squared = 0.3316 Adj R-squared = 0.3284 Total 147370.727 419 351.720112 Root MSE = 15.37 math_scr Coef. Std. Err. t P> t [95% Conf. Interval] str -.9128919.4040738-2.26 0.024-1.707167 -.1186164 el_pct -.5655231.0418044-13.53 0.000 -.6476966 -.4833495 _cons 680.1895 7.875067 86.37 0.000 664.7097 695.6692
8. reg str el_pct F(1, 418) = 15.25 Model 52.7997914 1 52.7997914 Prob > F = 0.0001 Residual 1446.78107 418 3.46119873 R-squared = 0.0352 Adj R-squared = 0.0329 Total 1499.58086 419 3.57895193 Root MSE = 1.8604 str Coef. Std. Err. t P> t [95% Conf. Interval] el_pct.019413.0049704 3.91 0.000.0096429.029183 _cons 19.33432.1199307 161.21 0.000 19.09858 19.57006