Problem set - Selection and Diff-in-Diff

Size: px

Start display at page:

Download "Problem set - Selection and Diff-in-Diff"

Mervyn Stone
5 years ago
Views:

1 Problem set - Selection and Diff-in-Diff 1. You want to model the wage equation for women You consider estimating the model: ln wage = α + β 1 educ + β 2 exper + β 3 exper 2 + ɛ (1) Read the data into Stata:. use http :// fmwww.bc.edu /ec -p/ data / wooldridge / mroz. dta. label var nwifeinc " Income not from wife ". label var kidslt6 " Kids 1-6 years old ". label var kidsge6 " Kids > 6 years old " (a) Estimate the model in equation 1 using OLS. Assuming that the RHS-variables are all uncorrelated with the error term in the population, why may the estimates still be biased?. reg lwage educ exper expersq Source SS df MS Number of obs = F( 3, 424) = Model Prob > F = Residual R- squared = Adj R- squared = Total Root MSE = lwage Coef. Std. Err. t P > t [95% Conf. Interval ] educ exper expersq _cons est sto ols 1

2 . /* > Estimates may still be biased because we only observe wages for > individuals who choose to work. Since choosing to work depends > on the wage, we would expect low - earning women ( conditional on > covariates ) to opt out and appear censored in our data. > */ (b) Estimate the Heckman selection model in two individual steps without exclusion restrictions, by predicting the inverse mills ratios and including this as a control variable in the wage equation.. gen d = lwage!=.. probit d educ exper expersq Iteration 0: log likelihood = Iteration 1: log likelihood = Iteration 2: log likelihood = Iteration 3: log likelihood = Probit regression Number of obs = 753 LR chi2 (3) = Prob > chi2 = Log likelihood = Pseudo R2 = d Coef. Std. Err. z P > z [95% Conf. Interval ] educ exper expersq _cons est sto select. predict xb, xb. gen invmills = normalden (-xb )/(1 - normal (-xb )) 2

3 . * Note that this is equivalent to:. * gen invmills = normalden ( xb )/ normal ( xb). reg lwage educ exper expersq invmills, Source SS df MS Number of obs = F( 4, 423) = Model Prob > F = Residual R- squared = Adj R- squared = Total Root MSE = lwage Coef. Std. Err. t P > t [95% Conf. Interval ] educ exper expersq invmills _cons est sto noexcl (c) Estimate the Heckman selection model using the command -heckman-, and include the variables -nwifeinc-, -age-, -kidslt6-, and -kidsge6- in the selection model. Do you think the exclusion restriction is plausible?. heckman lwage educ exper expersq, /// > select ( educ exper expersq nwifeinc age kidslt6 kidsge6 ) Iteration 0: log likelihood = Iteration 1: log likelihood = Iteration 2: log likelihood =

4 Heckman selection model Number of obs = 753 ( regression model with sample selection ) Censored obs = 325 Uncensored obs = 428 Wald chi2 (3) = Log likelihood = Prob > chi2 = lwage Coef. Std. Err. z P > z [95% Conf. Interval ] lwage educ exper expersq _cons select educ exper expersq nwifeinc age kidslt kidsge _cons / athrho / lnsigma rho

5 sigma lambda LR test of indep. eqns. ( rho = 0): chi2 (1) = 0.03 Prob > chi2 = est sto mle (d) Reestimate the model in 1c using the twostep estimator (-heckman, twostep-) and in individual steps (as in 1b above), and compare the estimates.. heckman lwage educ exper expersq, twostep /// > select ( educ exper expersq nwifeinc age kidslt6 kidsge6 ) Heckman selection model -- two - step estimates Number of obs = 753 ( regression model with sample selection ) Censored obs = 325 Uncensored obs = 428 = = Wald chi2 (3) Prob > chi2 lwage Coef. Std. Err. z P > z [95% Conf. Interval ] lwage educ exper expersq e -06 _cons select educ exper

6 expersq nwifeinc age kidslt kidsge _cons mills lambda rho sigma est sto twostep. probit d educ exper expersq nwifeinc age kidslt6 kidsge6 Iteration 0: log likelihood = Iteration 1: log likelihood = Iteration 2: log likelihood = Iteration 3: log likelihood = Iteration 4: log likelihood = Probit regression Number of obs = 753 LR chi2 (7) = Prob > chi2 = Log likelihood = Pseudo R2 = d Coef. Std. Err. z P > z [95% Conf. Interval ] educ exper expersq nwifeinc

7 age kidslt kidsge _cons est sto select2. drop xb invmills. predict xb, xb. gen invmills = normalden (-xb )/(1 - normal (-xb )). reg lwage educ exper expersq invmills Source SS df MS Number of obs = F( 4, 423) = Model Prob > F = Residual R- squared = Adj R- squared = Total Root MSE = lwage Coef. Std. Err. t P > t [95% Conf. Interval ] educ exper expersq e -06 invmills _cons est sto excl 7

8 2. This exercise uses data from Feb Mar and Nov Dec 1992 on employment at fast food restaurants in the US States of New Jersey and Pennsylvania taken from Card and Krueger in The American Economic Review, Vol. 84(4). The data are described in the output below. We are interested in how the minimum wage affects employment decisions in these restaurants. In April 1992, New Jersey increased the minimum wage from $4.25 to $5.05. In Pennsylvania, the minimum wage was unchanged, and we assume it to be $3.80 for this exercise. Read the data into Stata:. use [ COURSE URL ]/ seminars / data / econ4136seminar09_ did. dta (a) Describe the data using -codebook-.. codebook, c Variable Obs Unique Mean Min Max Label sheet store id post if after the law ; 0 if before the law chain = Burger King ; 2= KFC ; 3= Roy Rogers ; 4= Wendys state if New Jersey ; 0 if Pennsylvania empft # full - time employees hrsopen number hrs open per day nregs number of cash registers in store minwage minimum wage rate ( USD ) temp More than 75% part - time employees d if Burger King ; 0 otherwise d if KFC ; 0 otherwise d if Roy Rogers ; 0 otherwise d if Wendy s; 0 otherwise (b) Estimate the following regression on the sample of fast food restaurants in Feb Mar empft ikt = α + γminwage kt + β 1 nregs ikt + β 2 hrsopen ikt + η j dj + ɛ ikt (2) j=2 where i denotes restaurant, k denotes state, and t = 0 if the observation is from Feb Mar and t = 1 if the observation is from Nov Dec.. reg empft minwage hrsopen nregs d2 -d4 if post == 0 Source SS df MS Number of obs = F( 6, 390) = Model Prob > F = Residual R- squared = Adj R- squared = Total Root MSE = empft Coef. Std. Err. t P > t [95% Conf. Interval ] minwage

9 hrsopen nregs d d d _cons i. Interpret the coefficient γ, and calculate a 90% confidence interval. γ is supposed to reflect the impact on employment per dollar change in the minimum wage rate. A very good student might note that ˆγ is simply the difference in employment across states (conditional on covariates), divided by the difference in the minimum wage rate. CI α (ˆγ) = ˆγ ± t 1 α/2 ŝe (ˆγ) = 5.18 ± = [ 8.94, 1.43] ii. Use the Sum of squares table from the regression output to calculate the R 2 and the standard error of the regression (Root MSE). R 2 = SER = T SS RSS = MSS T SS T SS = = RSS n k 1 = = iii. Give an economic interpretation of the coefficients η 2 η 4. What might explain the relatively large coefficient on -d4-? η 2 η 4 estimates how many employees there are at KFC, Roy Rogers and Wendys, compared to at Burger King. Our estimates suggest that controlling for opening hours and the number of registers, Roy Rogers employs about one person less than Burger King, KFC one person more and Wendys four persons more. An explanation for the latter could be that Wendys sells higher quality food that is more labor intensive. iv. Test H 0 : η 2 = η 3 = 0.. test d2 = d3 = 0 ( 1) d2 - d3 = 0 ( 2) d2 = 0 F( 2, 390) = 1.00 Prob > F =

10 . estimates store unrest. reg empft minwage hrsopen nregs d4 if post == 0 Source SS df MS Number of obs = F( 4, 392) = Model Prob > F = Residual R- squared = Adj R- squared = Total Root MSE = empft Coef. Std. Err. t P > t [95% Conf. Interval ] minwage hrsopen nregs d _cons lrtest unrest Likelihood - ratio test LR chi2 (2) = 2.04 ( Assumption :. nested in unrest ) Prob > chi2 = Alternatively, we can do this by hand, e.g. F = = ( R 2 u Rr 2 ) / (ku k r ) (1 Ru) 2 / (n k u 1) ( ) /2 ( ) /390 So we cannot reject at reasonable significance levels. v. Test the hypothesis H 0 : η 2 = η 3 using the estimated covariance matrix of the coefficients (-mat list e(v)-). Verify your answer by running the test in Stata using -test-, and/or by performing an F-test. Use the covariance matrix of the coefficients to calculate the standard error of ˆη 2 ˆη 3, and test whether this is significantly different from zero, var (ˆη 2 ˆη 3 ) = var (ˆη 2 ) + var (ˆη 3 ) + 2cov (ˆη 2, ˆη 3 ). estimates restore unrest ( results unrest are active now ). mat list e(v) symmetric e(v )[7,7] minwage hrsopen nregs d2 d3 d4 _cons minwage hrsopen nregs d d d _cons

11 This gives var (ˆη 2 ˆη 3 ) = (0.5866) = ˆη 2 ˆη = = ŝe (ˆη 2 ˆη 3 ) Using 2-sided test and a t-table, p 20%. Alternatively, one could run the test directly in STATA, or estimate the model empft ikt = α+γminwage kt +β 1 nregs ikt +β 2 hrsopen ikt +η 2 (d2 + d3)+η 4 d4+ɛ ikt and do an F-test compared to the unrestricted model above.. test d2 = d3 ( 1) d2 - d3 = 0 F( 1, 390) = 1.75 Prob > F = gen d23 = d2 + d3. reg empft minwage hrsopen nregs d23 d4 if post == 0 Source SS df MS Number of obs = F( 5, 391) = Model Prob > F = Residual R- squared = Adj R- squared = Total Root MSE = empft Coef. Std. Err. t P > t [95% Conf. Interval ] minwage hrsopen nregs d d _cons lrtest unrest Likelihood - ratio test LR chi2 (1) = 1.78 ( Assumption :. nested in unrest ) Prob > chi2 = (c) We now want to control for potential selection issues by using the panel structure of our data. i. Explain why the previous estimate of γ is likely to suffer from omitted variable bias. State-specific factors will probably impact both employment and the minimum wage. For instance, a high minimum wage could mean a higher wage level in the state, which might cause lower employment at a given employer. Or a high minimum wage could mean that many employees are at the lower bound, which could imply the opposite. ii. Assume that ɛ ikt = µ k + ζ t + u ikt and that E [u ikt x ikt ] = 0 (where x ikt is the 11

12 vector of RHS-variables in (2) except -minwage-). Explain how you can then use the increase in the minimum wage in New Jersey and a difference-in-differences (DD) model to identify the effect of the minimum wage on employment. Give an example where the necessary assumption(s) are violated. You can use DD to control for (i) unobservables that change similarly across states and (ii) unobservables that are constant in a state. Specifically, if x kt is the vector of all control variables, -post-, and -state-, then we assume E (ɛ kt x kt ) = µ k + ν t + E (u kt x kt ) E (u kt x kt ) = 0 which implies that E [Y state = 1, post = 1] E [Y state = 1, post = 0] = γ minwage + ν t E [Y state = 0, post = 1] E [Y state = 0, post = 0] = ν t and hence the DD estimator is a consistent estimator of γ minwage: DD = E [Y state = 1, post = 1] E [Y state = 1, post = 0] E [Y state = 0, post = 1] E [Y state = 0, post = 0] This assumption is not satisfied if there are time-varying factors affecting employment, that differ across states. For instance, if the macroeconomic environment improves in New Jersey relative to Pennsylvania, then the DD estimator is biased upwards. iii. Generate a table of means, a table of standard errors and a table of frequencies for -empft- in each state and each time period (post = 1 and post = 0).. table state post, c( mean empft sd empft n empft ) if New Jersey ; 0 if 1 if after the law ; Pennsylva 0 if before the law nia iv. Using these statistics, calculate a DD estimate of the impact of the minimum wage law on employment. 12

13 ˆ DD = [Ȳ (state = 1, post = 1) Ȳ (state = 1, post = 0)] ] [Ȳ (state = 0, post = 1) Ȳ (state = 0, post = 0) = ( ) ( ) = 3.38 v. Specify and estimate the corresponding regression.. gen pt = post * state. reg empft state post pt, Source SS df MS Number of obs = F( 3, 794) = 2.20 Model Prob > F = Residual R- squared = Adj R- squared = Total Root MSE = empft Coef. Std. Err. t P > t [95% Conf. Interval ] state post pt _cons vi. How much does this suggest that the minimum wage affects full time employment in fast food restaurants? To interpret the estimate as the impact of the minimum wage hike, we should divide the reduced form by the first stage ( minwage): ˆγ = DD/ ( ) = 4.22 vii. Explain why the t-test from the regression above may understate the uncertainty in the effect of the minimum wage on full time employment. How could you correct the standard error? Compare the t-values with and without this correction. To get the standard error correct, we should take into account the estimation error on the first stage. An easy way to do this is to estimate the IV-model directly.. ivreg empft state post ( minwage = pt) Instrumental variables (2 SLS ) regression Source SS df MS Number of obs = F( 3, 794) = 2.20 Model Prob > F = Residual R- squared = Adj R- squared = Total Root MSE =

14 empft Coef. Std. Err. t P > t [95% Conf. Interval ] minwage state post _cons Instrumented : minwage Instruments : state post pt viii. What regression would you run to estimate the DD model including control variables? Run the regression using robust standard errors. A regression equivalent with our data would be (suppressing the i subscript) empft kt = α + ξpost t + ηstate k + DD post t state k + x ktβ + ɛ kt or equivalently empft kt = α k + η t + γ kt + x ktβ + ɛ kt. reg empft state post pt hrsopen nregs, robust Linear regression Number of obs = 775 F( 5, 769) = Prob > F = R- squared = Root MSE = Robust empft Coef. Std. Err. t P > t [95% Conf. Interval ] state post pt hrsopen nregs _cons ix. How might you test the key identifying assumptions underlying your DiD-estimation in this application, and in general? Essentially, you need a common trend, and you need to argue that the treatment is likely to be random wrt trends. It is useful to decompose the common trend assumption into Stable composition: test by A. comparing characteristics across groups in different time periods B. estimating effects on outcomes that should NOT be affected but should be determined by similar background variables Common time shocks: test by A. comparing trend in outcomes across groups in different time periods B. estimating placebo treatments in periods before (sometimes also after) actual treatment 14

Warwick Economics Summer School Topics in Microeconometrics Instrumental Variables Estimation

Warwick Economics Summer School Topics in Microeconometrics Instrumental Variables Estimation Michele Aquaro University of Warwick This version: July 21, 2016 1 / 31 Reading material Textbook: Introductory