Lab 1 Key Regression Analysis: wage versus yrsed, ex wage = - 4.78 + 1.46 yrsed +.126 ex Constant -4.78 2.146-2.23.26 yrsed 1.4623.153 9.73. ex.12635.2739 4.61. S = 8.9851 R-Sq = 11.9% R-Sq(adj) = 11.7% Regression 2 8233.2 4116.6 5.99. Residual Error 754 687.8 8.7 Total 756 6914. Regression Analysis: wage versus yrsed, ex, fe, feex wage = - 4.66 + 1.49 yrsed +.155 ex - 1.3 fe -.478 feex Constant -4.657 2.184-2.13.33 yrsed 1.485.1499 9.91. ex.15522.3771 4.12. fe -1.31 1.311 -.79.432 feex -.4778.5348 -.89.372 S = 8.93383 R-Sq = 13.1% R-Sq(adj) = 12.7% Regression 4 984.5 2271.1 28.46. Residual Error 752 619.6 79.8 Total 756 6914. The estimated coefficients for fe and feex are interpreted as follows: For fe, a woman earns a starting salary (ex=) that is $1.3 less than those of men, holding education constant. Reviewing the calculated t statistic, we would conclude that fe does not have a statistically significant effect on wage. For feex, a woman earns a raise that is about $.5 less per hour than a man earns, holding education constant. This effect is also not statistically significant according to a t test. But, we should conduct a t test before removing these variables from our model. As an aside, we should note that the adjusted R square did increase after adding these two variables. F alc [ ESS ESS ] / 2 [984.5 8233.2] / 2 425.65 RSS / ( n K 1) 6,19.6 / 752 79.8 2 1 = = = = 2 2 5.33
This calculated F statistic will be statistically significant (greater than any of the critical values). The two variables do affect wages, or at least one does. Regression Analysis: wage versus yrsed, ex, exsq, fe, feex wage = - 6.47 + 1.42 yrsed +.493 ex -.756 exsq - 1.15 fe -.431 feex Constant -6.468 2.23-2.94.3 yrsed 1.4214.1491 9.53. ex.49257.8883 5.55. exsq -.7558.186-4.18. fe -1.148 1.297 -.89.376 feex -.431.5292 -.81.416 S = 8.83732 R-Sq = 15.1% R-Sq(adj) = 14.6% Regression 5 1452.2 29.4 26.77. Residual Error 751 58651.8 78.1 Total 756 6914. We generated fitted values for wages using Minitab s Storage option. We then plotted the fitted values. Why such a scatter? Shouldn t we see a clear hill shape or two? Answer, we did not hold education constant, so we have different functions for different levels of education. 35 Scatterplot of Wagehat1 vs ex 3 25 Wagehat1 2 15 1 5-5 1 2 3 ex 4 5 6
We then held education constant at 12 years, a high school diploma. Then our only variables are experience (on the horizontal axis) and the dummy variable for female. Plotting our new fitted values for wage versus experience gives the following graph. We can see the expected quadratic effect of experience on wage for males and females. You should be able to determine the partial effects of education on wage for females and males and you should be able to use those expressions for the partial effects to determine the exact value for experience that will give the top of each hill. 2 Scatterplot of Wagehat vs ex 18 16 Wagehat 14 12 1 1 2 3 ex 4 5 6 Another possible problem we considered is the distribution of wages, and the disturbances. Wage is a random variable because of the disturbances. If wage looks normally distributed then the disturbances are likely normal. If wage does not look normal, then we might have a problem with the disturbances. Now that s a bad thing because in our Classical Regression Model, we assume that the disturbances are normally distributed. Without normal disturbances, we actually lose our ability to do inference. On the following page, I ve placed the histogram showing the distribution of wages (clearly right skewed) and the distribution of the error or residuals from the quadratic model we just estimated. We can see that the graph of the residuals also looks right skewed. (Houston we have a problem.)
14 12 Histogram of wage Normal Mean 16.64 StDev 9.561 N 757 1 Frequency 8 6 4 2. 12.5 25. 37.5 wage 5. 62.5 75. Histogram (response is wage) 25 2 Frequency 15 1 5-15 15 3 Residual 45 6 Luckily, there is a possible solution and, the distribution of the residuals, while skewed right, is not that bad. The solution is to take the natural logarithm of the wages, then use that as the dependent variable in estimating wages. We calculated the natural log of wages (in Minitab s Calc, Calculator: LN(wage)). We then estimated the model shown below. The histogram of residuals from this model looks better, nearly normal.
Regression Analysis: lnwage versus yrsed, ex, exsq, fe, feex lnwage = 1.47 +.698 yrsed +.321 ex -.55 exsq -.696 fe -.295 feex Constant 1.4745.1154 12.78. yrsed.69785.788 8.94. ex.32132.4653 6.91. exsq -.5492.946-5.34. fe -.6962.6794-1.2.36 feex -.2945.2772-1.6.288 S =.462887 R-Sq = 16.3% R-Sq(adj) = 15.7% Regression 5 31.252 6.254 29.17. Residual Error 751 16.9123.2143 Total 756 192.1643 14 Histogram (response is lnwage) 12 1 Frequency 8 6 4 2-3.2-2.4-1.6 -.8 Residual..8 1.6 You should note that the interpretation of parameters here requires that you multiply by 1 to interpret them as percent changes in wage. For example, a one year increase in education causes a 7 percent increase in wage, holding experience constant. Experience is still quadratic, so you need to calculate partial effects that depend upon the level of experience, and differs for females. But once you get a numeric value, you then multiply by 1 to interpret the numeric values as the percent change in wage given a 1 year increase in experience. The dummy variable, needs special attention. To interpret the effect of a dummy as a percentage change in wage, we do the following:
( ˆ δ ) (.696 ) exp 1 1 = exp 1 1 = 6.72%. Thus, women earn starting wages that are 6.72% lower than men, holding education constant. Demand for Roses: We then estimated a nonlinear model for the demand for roses. We assumed that the Cobb Douglas model represented the demand for roses: β β1 β2 β3 u Salest = exp Prose Pcarn Dinc exp Taking the natural log of both sides of this expression gives the following model that is now linear in the parameters: ln Sales = β + β ln Prose + β ln Pcarn + β ln Dinc + u t 1 2 3 To estimate, we need to take the natural log of Sales, Prose, Pcarn and Dinc using Minitab s natural log function (eg., LN(Sales)). The results are: Regression Analysis: lnsales versus lnprose, lnpcarn, lndinc lnsales = 6.29-1.86 lnprose + 1.45 lnpcarn +.56 lndinc Constant 6.288 4.875 1.29.221 lnprose -1.8562.3438-5.4. lnpcarn 1.4541.5724 2.54.26 lndinc.5596.9211.61.555 S =.175868 R-Sq = 73.7% R-Sq(adj) = 67.2% For this model, all estimated parameters (for slopes ) are interpreted as elasticities. For example, a 1% increase in the price of roses causes a 1.86 % decrease in the sales of roses, holding the price of carnations and disposable income constant. We also note that two of the three estimates are statistically significant (different from zero) and that the income elasticity, despite being insignificant, is at least positive as expected. What does this demand function look like? Let s plot the predicted sales versus price or roses holding the price of carnations and disposable income constant at their means. To do that we use the calculator to first predict lnsales using our sample regression function (just set lnpcarn and lndinc equal to their sample means). In the first graph, I plotted lnsaleshat (predicted values in natural log form) versus the natural log of Prose. The model looks linear confirming that the natural log transformation will allow us to estimate a nonlinear function using OLS. To
see the original form of the model, a nonlinear relationship between Sales and the price of roses, I transform the predicted sales back to the original units by exponentiating lnsaleshat (in Calc, Calculator, use exp(lnsaleshat)). The final graph below shows the nonlinear demand function we estimated using this approach. While the slope changes, the elasticity is a constant. 9.6 Scatterplot of lnsaleshat vs lnprose 9.4 9.2 lnsaleshat 9. 8.8 8.6 8.4 8.2.8.9 1. 1.1 1.2 lnprose 1.3 1.4 1.5 Scatterplot of SalesHat vs Prose 13 12 11 1 SalesHat 9 8 7 6 5 4 2. 2.5 3. Prose 3.5 4. 4.5