1 Linear Regression Analysis The Mincer Wage Equation Data Econometric Model Estimation... 11

Size: px

Start display at page:

Download "1 Linear Regression Analysis The Mincer Wage Equation Data Econometric Model Estimation... 11"

Christal Powers
5 years ago
Views:

1 Econ Econometric Review 1 Contents 1 Linear Regression Analysis The Mincer Wage Equation Data Econometric Model Estimation Diagnostics - Goodness of Fit

2 Econ Econometric Review Inference - Hypothesis Testing Reporting the results Interpretation of the Estimates Multivariate Regression Analysis Diagnostics - Goodness of Fit Interpretation of the Estimates Choosing the Functional Form Potential Problems

3 Econ Econometric Review Multicollinearity Omitted Variables Bias Heteroscedasticity

4 Econ Econometric Review 4 1 Linear Regression Analysis 1.1 The Mincer Wage Equation Our first exercise in empirical analysis will focus on the determinants of wages in a cross-section of individuals, that is, observations on individuals at a specific point in time. A complete wage equation model would include the following human capital variables log(wages i ) = β 0 + β 1 educ i + β 2 exper i + β 3 exper 2 i u i (1) where the term u i contains factors such as ability, quality of education, family background and other factors influencing a person s wage.

5 Econ Econometric Review 5 For some specific purpose, we will also include gender and union status. We may think of the relationship between wages and their determinants, including institutions and industrial characteristics, as the wage structure. Let s suppose to begin with that we are interested in the effect of education, β 1, measured in years of schooling, on wages wages i = β 0 + β 1 educ i + u i (2)

6 Econ Econometric Review Data The Labour Force Survey selects individuals (close to) randomly and ask them about their wage (Y i ), education and other characteristics (X i ). These data {(X i, Y i ) : i = 1,, n} will constitute our random sample (A2) of size n from the population. A scatter plot of wages and education level indicates a positive relationship.

7 Econ Econometric Review 7 rwage schooling Figure 1: Wages and Years of Schooling As do the average wages by education level

8 Econ Econometric Review 8 Table 1: Average hourly wages by education level Education Level Years Wages of Schooling All workers to 8 years Some secondary Grade 11 to Some post secondary Post secondary diploma University: bachelors Graduate degree

9 Econ Econometric Review 9 But we may want to know by how much do wages increase when schooling increases by one year 1.3 Econometric Model The (population) regression function E(wages i educ i ) = β 0 + β 1 educ i (3) describe the wages conditional on a level of schooling as a linear (A1) function of the parameters, under the zero conditional mean (A3) assumption E(u i educ i ) = 0, For any given value of schooling, the distribution of wages is centered about E(wages schooling)

10 Econ Econometric Review 10 rwage schooling Figure 2: E(wages schooling) as a linear function of schooling Note that E(u i educ i ) = 0 implies by the law of iterated expectations that E(u i ) = 0 and than Cov(u i, educ i ) = E(u i educ i ) = 0. This means that u i has a zero mean and is uncorrelated with educ, which may be farfetched in this case.

11 Econ Econometric Review 11 Another typical assumption (A5) is that V ar(u i educ) = σ 2 is constant, a property called homoskedasticity. But it appears problematic here! We will see later how to test for it. 1.4 Estimation The objective is to obtain an estimate called β 1 of the unknown parameter β 1 from the data sample

12 Econ Econometric Review 12 Let Y i denote wages and X i denote education, we can write the model as Y i = β 0 + β 1 X i + u i (4) Either through the method of moments which substitutes the sample average in the moments conditions E(u i ) = E[Y i β 0 β 1 X i )] = 0 E(u i X i ) = E[(Y i β 0 β 1 X i ) X i ] = 0 or with the ordinary least squares estimator which minimizes the sumof-squared errors, SS = n i=1 (Y i β 0 β 1 X i ) 2, we obtain the same estimator β 1 = ni=1 (X i X)(Y i Ȳ ) ni=1 (X i X) 2 (5)

15 Econ Econometric Review 13 which will work provides that n i=1 (X i X) 2 > 0, that is that there is enough sampling variation (A4). But at the same times, OLS will be sensitive to outliers, so we do not want too much variation. This can be written as finite fourth moments: 0 < E(Xi 4 ) < and 0 < E(Y 4 i ) <. This makes sense since the population parameter β 1 β 1 = Cov(Y i, X i ) V ar(x i ) when E(u i ) = 0 and Cov(u i, X i ) = 0.

16 Econ Econometric Review 14 and β 0 = E(Y i ) E(X i )β 1 will be estimated by β 0 = n Y i β n 1 X i (6) i=1 i=1 Predicted wages are obtained from sample regression function Ŷ = β 0 + β 1 X (7) The residual û i, is an estimate of the error term u i, and is the difference between the fitted line (sample regression function) and the sample point û i = Y i Ŷi

17 Econ Econometric Review 15 Thus intuitively, OLS is fitting a line through the sample points such that the vertical distance between the actual wages and the predicted wage squared, that is, the squared residuals, is as small as possible Under assumptions (A1)-(A4), our OLS estimates will be unbiased, that is E( β 1 ) = β 1 and E( β 0 ) = β 0. Adding assumption (A5), the OLS estimator is BLUE is the sense that it is the minimum variance linear unbiased estimator (Gauss-Markov Theorem). In practice, the computer software does the computation for us

18 Econ Econometric Review 16. regress rwage schooling Source SS df MS Number of obs = F( 1, 9718) = Model Prob > F = Residual R-squared = Adj R-squared = Total Root MSE = rwage Coef. Std. Err. t P> t [95% Conf. Interval] schooling _cons predict prwage (option xb assumed; fitted values). predict reswage, residuals

19 Econ Econometric Review Diagnostics - Goodness of Fit The STATA output gives many measures of whether our regression model fits the data well M odel/explained : SSE Residual : SSR T otal : SST n i=1 n i=1 n i=1 (Ŷi Ȳ ) 2 (û i ) 2 (Y i Ȳ ) 2 and the R 2 which is the ratio of the explained variation compared to the total variation R 2 = SSE/SST = 1 SSR/SST

20 Econ Econometric Review 18 The R 2 can also be shown to equal the squared correlation coefficient between the actual Y i and the fitted values Ŷi. The adjusted R 2 takes into account the number of explanatory variables R 2 a = 1 (1 R2 )(n c)/(n k) where k is the number of variables in the model and c = 1 if there is a constant. Here, a R 2 = 0.15 means that 15% of the variation in wages across individuals is explained by their education level This means that 85% of the variation in wages remains unexplained! We will want to add more variables!

21 . regress rwage schooling Source SS df MS Number of obs = F( 1, 9718) = Model Prob > F = Residual R-squared = Adj R-squared = Total Root MSE = rwage Coef. Std. Err. t P> t [95% Conf. Interval] schooling _cons

22 Econ Econometric Review 19 Yet, typically in cross-sectional data R 2 are very low. So by these standards, this regression is pretty good, but the R 2 is not the only way to judge the success of a model. Also reported are Root MSE = s = SSR/(n k) and F = SSE/(k c)s 2 The F-Statistic is used to test whether a group of variables should be included in the model.

23 . regress rwage schooling Source SS df MS Number of obs = F( 1, 9718) = Model Prob > F = Residual R-squared = Adj R-squared = Total Root MSE = rwage Coef. Std. Err. t P> t [95% Conf. Interval] schooling _cons

24 Econ Econometric Review Inference - Hypothesis Testing The success of a model also depends on whether the variables included in the model belong there, that is, are statistically significant. Under the assumption (A6) that the u i are normally distributed with zero mean and variance σ 2 : u Normal(0, σ 2 ), the estimates β will also be distributed normally distributed, and ( β β)/se( β) t DF will follow the Student-t distribution, where DF = n k 1 the degrees of freedom in the model is equal to the number of observations minus the number of variables minus 1 for the constant.

25 Econ Econometric Review 21 We can use the t statistic reported by STATA to test the null hypothesis H 0 : β = 0 against H 1 : β 0 If the t statistic is greater the critical value corresponding to our degrees of freedom and the desired level of the test (5% or 1%), we can reject the null The rule of thumb is: if t 2.0 then reject H 0 : β = 0 at the 5% significance level. For more robustness, sometimes we prefer even higher values. But we do not have to look the critical values in a table since STATA gives us the p value corresponding to our t statistic

26 Econ Econometric Review 22 If p 0.01 then the relationship is significant at the 1% level, If p 0.05 then the relationship is significant at the 5% level, If p 0.10 then the relationship is significant at the 10% level, Here with a t-statistic of 41.59, we can say that schooling is a very significant factor explaining the variation in wages It is all very good to know that the coefficient of schooling is different from zero, but we would also like know how precisely it is estimated The confidence intervals tells us that, under the classical OLS assump-

27 Econ Econometric Review 23 tions (A1-A6), there is a 95% chance that the true parameter lies β ± α se( β) where α is the 97.5 t h percentile in a t n k 1 distribution. If the degrees of freedom DF = n k 1 > 120, the t n k 1 distribution is close enough to the normal to use the 97.5 t h percentile of the standard normal, the confidence intervals will be [ β 1.96 se( β), β se( β)] Thus the rule of thumb: a coefficient is not significant if its magnitude is less than twice its standard error.

28 Econ Econometric Review 24 In our example, this means that there is 95% chance that the true coefficient of schooling is between 1.47 and 1.61, that is within a 0.14 range, this is almost too precise!

29 . regress rwage schooling Source SS df MS Number of obs = F( 1, 9718) = Model Prob > F = Residual R-squared = Adj R-squared = Total Root MSE = rwage Coef. Std. Err. t P> t [95% Conf. Interval] schooling _cons

30 Econ Econometric Review Reporting the results The results from a STATA output are reported in a table that typically contains estimated coefficients standard errors of the coefficients number of observations R 2 or R 2 a In some instances, it may worthwhile to report other statistics. We will discuss these issues when we will cover the readings.

31 Econ Econometric Review 26 The custom command outreg used after the regress command handles the formatting of the output. (See the course web site on how to install custom commands.) outreg schooling using tablem1, replace bdec(3) se 3aster title("wage Regression") ctitle("(1)") the file Table M1.out can then be opened in Excel (right-clicking on it) Wage Regression (1) schooling [0.002]*** Constant [0.027]*** Observations 9720 R-squared 0.14 Standard errors in brackets * significant at 10\%; ** significant at 5\%; *** significant at 1\%

32 . regress rwage schooling Source SS df MS Number of obs = F( 1, 9718) = Model Prob > F = Residual R-squared = Adj R-squared = Total Root MSE = rwage Coef. Std. Err. t P> t [95% Conf. Interval] schooling _cons

33 Econ Econometric Review Interpretation of the Estimates In general, the β parameters measure the marginal effect of increasing X by one unit on the predicted wages Ŷ. In our example, ŵage = β 1 educ tell us that the wage value of an one additional year of schooling in this sample is $1.54. But in this simple regression, we cannot claim to have found a causal relationship, so we should be cautious in our interpretation

34 Econ Econometric Review 28 The value of for β 0 says that a person with zero years of schooling has a negative predicted wage, which is silly. This occurs because no one in our sample has less than 8 years of schooling. For a person with eight years of schooling, the predicted wage is ŵage = = 9.90 which is above the minimum wage. If this person completes high school (4 more years), our model predicts that the predicted wage would be higher by 4*$1.54=$6.60 per hour more! This is more than the average wage of $15.80 for high school graduates in Table 1, which may make us question the linearity in our functional form assumption.

35 Econ Econometric Review 29 Indeed, it is more common to estimate the following log-linear model log(wages i ) = β 0 + β 1 educ i + u i (8) where log( ) denotes the natural logarithm. Since wages tend to be lognormal, this reduces the problem of heteroscedasticity. This is equivalent with writing wage = exp(β 0 + β 1 educ i + u i ), which is consistent with the increasing returns to education that we found in Table 1. In this case the interpretation of β 1 is % ŵage (100 β 1 ) educ that is multiplying β 1 by 100 gives us the percentage change in predicted wage given an additional year of schooling.

36 Econ Econometric Review 30 We run the log wage regression by first taking the log of the dependent variable regress lrwage schooling Source SS df MS Number of obs = F( 1, 9718) = Model Prob > F = Residual R-squared = Adj R-squared = Total Root MSE = lrwage Coef. Std. Err. t P> t [95% Conf. Interval] schooling _cons The coefficient on schooling has a percentage interpretation when it

37 Econ Econometric Review 31 is multiplied by 100. That is, predicted wages increase by 8.2 percent for every additional year of education. In the human capital interpretation of the wage equation, this means that the rate of return of one year of schooling is 8.2%, not bad! This easy interpretation of the rate of return of schooling is one of the reasons why the log wage specification is the preferred one. The intercept of is again not very meaningful, since it gives the predicted log(wages) when schooling = 0

38 Econ Econometric Review 32 The log-linear model imposes a constant percentage effect of schooling on wages. Another important model is the log-log model which is a constant elasticity model. It would be more meaningful if we had some measure of output as in log(salary) i = β 0 + β 1 log(sales) i + u i In this case, the interpretation of β 1 is the estimated elasticity of salary with respect to sales % salary = β 1 % sales β 1 = % salary % sales

39 Econ Econometric Review Multivariate Regression Analysis We have already improved our wage equation model by using log(wages), now we would like to add more variables, in particular labour market experience We can also use a more flexible functional form by adding higher order terms (polynomial) in the explanatory variables For example, here a quadratic in experience can capture diminishing returns to on-the-job training log(wages i ) = β 0 + β 1 educ i + β 2 exper i + β 3 exper 2 i u i (9)

40 Econ Econometric Review 34 In the US equivalent of the Canadian Labour Force Survey, the Current Population Survey (CPS), data on years of schooling and age in years is available, but not the number of years of actual labour market experience So a potential experience variable is constructed as: exper = (age educ 6) and the regression results for US-CPS (for October 1997) are

41 Econ Econometric Review 35. regress lwage educ exper exp2 [weight=weight] (analytic weights assumed) (sum of wgt is e+07) Source SS df MS Number of obs = F( 3, 11889) = Model Prob > F = Residual R-squared = Adj R-squared = Total Root MSE = lwage Coef. Std. Err. t P> t [95\% Conf. Interval] educ exper exp _cons

42 Econ Econometric Review 36 test exper= exp2=0 ( 1) exper - exp2 = 0 ( 2) exper = 0 F( 2, 11889) = Prob > F = Diagnostics - Goodness of Fit As before, we can use the t-statistic to determine whether each variable is statistically significant individually

43 Econ Econometric Review 37 But we can also use the F-statistic to test the significance of the whole model, that is the hypothesis that the variables are jointly significant H 0 : β 1 = β 2 = β 3 = 0 vs. H 1 : H 0 is not true Here, we would overwhelmingly reject H 0. The F-statistic can also be used to test a restricted model against an unrestricted model. The model log(wages i ) = β 0 + β 1 educ i can be seen as a restricted version of the model with experience where H 0 : β 2 = β 3 = 0

44 Econ Econometric Review 38 We can test this hypothesis using the R-squared form of the F-statistic F (R 2 ur R2 r )/q (1 R 2 ur)/(n k 1) where q is the number of exclusion restrictions and n k 1 = DF ur. regress lwage educ [weight=weight] (analytic weights assumed) (sum of wgt is e+07) Source SS df MS Number of obs = F( 1, 11891) = Model Prob > F = Residual R-squared = Adj R-squared = Total Root MSE =.49209

45 Econ Econometric Review lwage Coef. Std. Err. t P> t [95% Conf. Interval] educ _cons We get F=[( )/( )](11889/2)= , which is greater than the critical value F 2,11889 = 3.00, so we reject H 0.

46 . regress lwage educ exper exp2 [weight=weight] (analytic weights assumed) (sum of wgt is e+07) Source SS df MS Number of obs = F( 3, 11889) = Model Prob > F = Residual R-squared = Adj R-squared = Total Root MSE = lwage Coef. Std. Err. t P> t [95\% Conf. Interval] educ exper exp _cons

47 Econ Econometric Review Interpretation of the Estimates The general model Ŷ i = β 1 X 1 + β 2 X 2 + β 3 X β k X k (10) can written in terms of changes Ŷ= β 1 X 1 + β 2 X 2 + β 3 X β k X k (11) the coefficient on the variable X k measures the change in Ŷi due to a one-unit increase in X k, holding all the other explanatory variables fixed (the so-called ceteris paribus) assumption: Ŷk = β k X k These effects are sometimes called marginal or partial effects.

48 Econ Econometric Review 41 In our example, since the dependent variable is log(wages) the interpretation of β 1 = 0.108, the coefficient of educ, is of 10.8 percent increase in predicted wages for every additional year of education. Since β 2 = 0.04 > 0 and β 3 = < 0, there is a concave relationship between log wages and experience. With the experience variable, the ceteris paribus assumption does not work directly, when we increase exper, exp2 will increase as well, so we have to compute the partial effects: log(wages) exper log(wages) exper = β β 3 exper exper (12)

49 Econ Econometric Review 42. sum exper Variable Obs Mean Std. Dev. Min Max exper scalar experbar=r(mean). di experbar lincom exper+2*exp2*experbar ( 1) exper exp2 = lwage Coef. Std. Err. t P> t [95% Conf. Interval] (1) Here, at the average experience level of years, this gives: *(-

50 Econ Econometric Review )*18.67=0.0145, or a return of about 1.5% per year of experience on average. The turning point exper = β 1 /(2 β 2 ) = appears to make sense But we can compare the plots of the quadratic on experience with a local polynomial estimate, a more flexible functional form. regress lwage exper exp2 [weight=weight]. gen twage=_b[_cons]+_b[exper]*exper+_b[exp2]*exp2. lpoly lwage exper [aweight=weight], gen(lwagep experp) nograph. twoway (scatter twage exper) (connected experp lwagep )

51 Econ Econometric Review twage lpoly smooth: lwage Figure 3: Impact of experience on log(wages) Compared with the univariate regression, log(wages) = β0 + β1 educ with the multivariate regression, log(wagesi ) = β0 + β1 educ + β2 exper + β3 exper 2, we would generally expect β1 β1 Here, the estimates are pretty close! This means that educ and exper are uncorrelated in this sample. This could also happens if β2 = 0, that is if exper was

52 Econ Econometric Review 45 uncorrelated with wages educ.

53 Econ Econometric Review Choosing the Functional Form We have already tried a few functional forms wages i = β 0 + β 1 educ i + u i log(wages i ) = β 0 + β 1 educ i + u i log(wages i ) = β 0 + β 1 educ i + β 2 exper i + β 3 experi 2 + u i Perhaps, we could soften the curvature of the relationship between exper and log(wages) with a quartic log(wages i ) = β 0 + β 1 educ i + β 2 exper i + β 3 exper 2 i + β 4 exper 3 i + β 5 exper 4 i + u i. regress lwage educ exper exp2 exp3 exp4 [weight=weight] (analytic weights assumed) (sum of wgt is e+07) Source SS df MS Number of obs = F( 5, 11887) = Model Prob > F = Residual R-squared =

54 Econ Econometric Review Adj R-squared = Total Root MSE = lwage Coef. Std. Err. t P> t [95% Conf. Interval] educ exper exp exp e exp4-1.80e e e e-07 _cons Now the model with the quadratic in experience in the restricted model, we get F=[( )/( )](11887/2)=36.55, which is greater than the critical value F 2,11887 = 3.00, so we reject H 0.

55 Econ Econometric Review 48 If the models were not nested (i.e. could not be derived one from the anoter), we can use the adjusted R 2 a as a guide to choose our preferred model Using log variables is also often convenient, especially for positive dollar amounts, and for very large variables such as population Variables measured in years and variables that are a proportion or percent are better used in level form

56 Econ Econometric Review Potential Problems Multicollinearity We could be tempted to use log(wages i ) = β 0 + β 1 educ i + β 2 log(exper i ) + β 3 log(exper 2 i ) + u i But this would not work because log(exper 2 i ) = 2 log(exper i), so that log(exper 2 i ) and log(exper i) would be perfectly correlated, we would have a problem of multicollinearity In this case, STATA would drop log(exper i ), so you would know that something is wrong

57 Econ Econometric Review 50 We can ask STATA to compute the Variance Inflation Factor, V IF = (1 R 2 k ) 1, which measures the degree to which the variance has been inflated because regressor k is not orthogonal to the other regressors.. estat vif Variable VIF 1/VIF exp exper educ Mean VIF 8.08 A rule of thumb states that there is evidence of collinearity if the largest VIF is greater than 10.

58 Econ Econometric Review 51 Here, it is not too surprising that exper and exper 2 are correlated, but the quadratic in experience provides a better fit. More generally, when we are adding explanatory variables to a regression model to reduce the error variance, we should always try to include independent variables that affect Y and are uncorrelated with all of the independent variables of interest. Because near-collinearity inflates standard errors, significant coefficients will become more significant if you include less collinear regressors. If we include variables that do not belong, there is no effect on our parameter estimate, and OLS remains unbiased, i.e. E( β 1 ) = β 1

59 Econ Econometric Review 52 Here, we could add demographic characteristics such as marital status, geographic location, etc Omitted Variables Bias If we omit variables that do belong, then the OLS estimate will likely be biased, E( β 1 ) β 1 For example, suppose that the true wage equation model was wages i = β 0 + β 1 educ i + β 2 abil + u i but that since we do not observe ability, we estimate wages i = β 0 + β 1 educ i + v i (13)

60 Econ Econometric Review 53 where v i = β 2 abil + u i Then, calling β 1 the estimate from the equation (13) that omits ability (13), we can show that E[ β 1 ] = β 1 + β 2 δ 1 where δ 1 = Cov(educ i, abil i ) V ar(educ i ) More generally, when X 1 and X 2 are correlated and β 2 0, the estimate β 1 will be biased. The sign of the bias depends on both the sign of β 2 and of δ 1

61 Econ Econometric Review 54 Corr(X 1, X 2 ) > 0 Corr(X 1, X 2 ) < 0 β 2 > 0 positive bias negative bias β 2 < 0 negative bias positive bias In the case of the wage equation, because more ability leads to higher productivity, and higher wage: β 2 > 0. There are also reason to believe that educ and ability are positively correlated, so we would think that the OLS estimates from equation (13) are too large What to do about it? This is not an easy problem to correct, if we do not have some measures of ability in our sample. (One has to take an quasi-experimental approach using IV for example.) However, in terms of reporting results, one would be aware of the

62 Econ Econometric Review 55 possibility of an omitted variables bias and qualify the results as likely upward biased or downward biased Heteroscedasticity When the variance of the error terms is not constant across observations, we have a problem of heteroscedasticity: V ar(u i educ) = σ 2 i = σ 2 (X i ) The OLS estimates are still unbiased and consistent, but the standard errors of the estimates are biased if we have heteroskedasticity

63 Econ Econometric Review 56 If the standard errors are biased, we can not use the usual t statistics or F statistics or LM statistics for drawing inferences But we can test for it using the Breusch-Pagan test, which amounts to testing H 0 : t = 0 in V ar(u i ) = σ 2 exp(zt), that is running a regression using the squared OLS residuals as dependent variable on the fitted values or some explanatory variables. estat hettest Breusch-Pagan / Cook-Weisberg test for heteroskedasticity Ho: Constant variance Variables: fitted values of lrwage chi2(1) = 3.42 Prob > chi2 =

64 Econ Econometric Review 57 Here we are happy when we fail to reject H 0 What to do about this? In STATA, heteroskedasticty-robust standard errors are easily obtained using the robust option of reg The resulting White or Huber standard errors will be asympotically valid in the presence of any form of heteroscedasticity, including homoscedasticity. When the form of the heteroskedasticiy is know, for example σ 2 i = σ 2 educ, then we can use weighted least squares (vwls in STATA). regress lwage educ exper exp2 exp3 exp4 [weight=weight],robust (analytic weights assumed)

65 Econ Econometric Review 58 (sum of wgt is e+07) Linear regression Number of obs = F( 5, 11887) = Prob > F = R-squared = Root MSE = Robust lwage Coef. Std. Err. t P> t [95% Conf. Interval] educ exper exp exp e exp4-1.80e e e e-07 _cons

Lab 6 - Simple Regression

Lab 6 - Simple Regression Spring 2017 Contents 1 Thinking About Regression 2 2 Regression Output 3 3 Fitted Values 5 4 Residuals 6 5 Functional Forms 8 Updated from Stata tutorials provided by Prof. Cichello