Empirical Application of Panel Data Regression

Size: px

Start display at page:

Download "Empirical Application of Panel Data Regression"

Philip Kelly
6 years ago
Views:

1 Empirical Application of Panel Data Regression 1. We use Fatality data, and we are interested in whether rising beer tax rate can help lower traffic death. So the dependent variable is traffic death, while the key regressor is the beer tax rate. 2. The data is a csv file. The command to read csv file is insheet (or use menu).. insheet using "I:\420\420_data_fatality.csv" (9 vars, 336 obs). rename statename state. label variable y "number of annual traffic death per 10,000 people". encode state, gen(id). list state id year beertax y in 1/10, nol state id year beertax y AL AL AL AL AL AL AL AR AR AR xtset id panel variable: id (balanced) We observe each state repeatedly, so this is panel data. Each state is a panel (group, cluster). Here each panel has 7 observations; each observation is state-year. 3. The variable state is string, for which we can generate a categorical variable using encode state, gen(id). Then we use id to declare this is a panel data, and Stata 1

2 finds it is balanced, i.e., there is no missing value at some year for some state. This can also be seen by using command. tab year year Freq. Percent Cum Total We can construct panel data by stacking the cross section in one year above another (using commands append or merge). To see this,. sort year id. list state year y in 1/ state year y AL AR AL AR The first 48 observations are the 1982 cross section, followed by the 1983 cross section, and so on. It is important to have id variable (Stata calls it group variable) such as state in each cross section. 5. In panel data there are three types of variables: (1) variables that are time-invariant such as state; (2) variables that are panel-invariant such as year; and (3) variables that vary over time and across panels such as beer tax and y. Later we will show the fixed effect (FE) and first difference (FD) estimators cannot estimate the effect of (1) variables that are time-invariant. We can use xtsum to tell which type a variable is: 2

3 . xtsum y year id Variable Mean Std. Dev. Min Max Observations y overall N = 336 between n = 48 within T = 7 year overall N = 336 between n = 48 within T = 7 id overall N = 336 between n = 48 within T = 7 A variable is panel-invariant if its between standard deviation is zero; it is timeinvariant if within standard deviation is zero. 6. First, we consider using only one year data (cross section). The scatter plot and simple regression both indicate positive correlation between beer tax and traffic death, which is ridiculous.. twoway (sca y beertax if year==1982, ml(state)) (lfit y beertax if year==1982) NM WY NV MT OK TX ID AZ LA AR CO KY TN WV ND SD DE KS VT OR CA UT INMO NH WA WI CT IA NE VA IL MD PA MI NJ OH ME MN NY RI MA FL MS NCAL SC GA beertax number of annual traffic death per 10,000 people Fitted values 3

4 . reg y beertax if year==1982, nohe beertax _cons Running a pooled regression that uses all years only helps a little. reg y beertax Source SS df MS Number of obs = F( 1, 334) = Model Prob > F = Residual R-squared = Adj R-squared = Total Root MSE = beertax _cons Notice that the sample size in pooled regression is 48 7 = 336, resulting in bigger t value. 8. This across-state comparison, whether using one year or all years, is prone to omitted variable bias. Factors like drinking culture are unobserved, but they can affect both beer tax and traffic death. If we believe drinking culture is (largely) time-invariant, then over-time comparison (time series regression) for given state is more meaningful:. two (sca y beertax if state=="al", ml(year)) (lfit y beertax if state=="al") 4

5 beertax number of annual traffic death per 10,000 people Fitted values. reg y beertax if state=="al", nohe beertax _cons Now we see a negative ˆβ 1, though it is insignificant. 9. Intuitively, the over-time or within comparison is more convincing because we can safely rule out the effect of time-constant unobserved factors such as drinking culture something that is fixed over time cannot be used to explain the over time variation in y. 10. If we assume the marginal effect of beertax on traffic death does not change across states, then we can improve our estimate by using all states. Toward that end, we need to generate the panel-specific first (or one-year) difference of traffic death and beer tax dy1 = y i,t y i,t 1 (1) dx1 = beertax i,t beertax i,t 1 (2) where y i,t 1 denotes the first lag of y i. Notice that we need two subscripts. The first 5

6 subscript i indexes panel, and the second subscript t indexes time. Here we compute the first difference for each given i. In stata, that means by id.. sort id year. by id: gen ylag1 = y[_n-1] (48 missing values generated). by id: gen dy1 = y-ylag1 (48 missing values generated). list id year y ylag1 dy1 in 1/ id year y ylag1 dy AL AL AL AL AR AR AR Notice that we get the first lag of y (called ylag1) by pushing the y series one-period downward, so one missing value is generated. It is worth emphasizing that we do this separately for each state using by id. The first non-missing observation in the first difference series (called dy1) is y 1,2 y 1,1 = = Next we get the panel-specific first difference of beertax, and apply OLS to the difference data:. by id: gen dx1 = beertax[_n]-beertax[_n-1] (48 missing values generated). reg dy1 dx1, nohe dy1 Coef. Std. Err. t P> t [95% Conf. Interval] dx

7 _cons This first or one-year difference estimate is negative and significant. Good news. 11. To understand why the FD estimator works, consider the structural models: y i,t = β 1 x i,t + β 2 w i + u i,t (3) y i,t 1 = β 1 x i,t 1 + β 2 w i + u i,t 1 (4) where w i does not have time subscript t because it denotes the time-invariant variable (like drinking culture). Subtracting (4) from (3) leads to y i,t = β 1 x i,t + u i,t (5) The FD estimator is just OLS applied to (5). Notice that w i has been removed by taking difference. So OLS applied to the differenced data, model (5), is not subject to omitted variable bias. By contrast, the OLS applied to pooled regression (3) suffer the omitted variable bias β 2cov(x i,t w i ) var(x i,t ). 12. It also becomes clear that the FD estimator cannot estimate the effect of any factor that is time-invariant (such as a dummy variable called south that equals one if a state is in south part of the country). 13. (Optional) FD estimator is consistent as long as cov( x i,t, u i,t ) = 0. A sufficient condition is strict exogeneity: cov(x i,tj, u i,tk ) = 0, t j, t k. 14. (Optional) In general, the error term in (5) is serially correlated for given i : cov( u i,t, u i,t k ) 0, k = 1,..., T 1. So cluster-robust standard error should be used in theory. 15. Even though ˆβ 1 is obtained from (5), its interpretation is still the change in y when x changes by one unit, i.e., in terms of (3). 16. To avoid the ambiguity of using one-year-difference or two-year-difference, we may consider the one-way fixed effect (FE) estimator 7

8 . xtreg y beertax, fe Fixed-effects (within) regression Number of obs = 336 Group variable: id Number of groups = 48 R-sq: within = Obs per group: min = 7 between = avg = 7.0 overall = max = 7 F(1,287) = corr(u_i, Xb) = Prob > F = beertax _cons sigma_u sigma_e rho (fraction of variance due to u_i) F test that all u_i=0: F(47, 287) = Prob > F = The FE estimator is based on the so called within regression T t=1 y i,t y i,t ȳ i = β 1 (x i,t x i ) + (u i,t ū i ) (6) T t=1 x i,t where ȳ i =, x T i = are the panel-specific means. The process of subtracting the panel-specific mean is called within transformation. The point is, the T within transformation, akin to taking difference, can remove the time-invariant unobserved heterogeneity: w i w i = 0. In other words, the FE estimator is not subject to the bias of omitting time-invariant factors. The downside is, FE estimator cannot estimate the effect of any time-invariant factors. 17. We can obtain demeaned value by regressing a variable on constant term. Likewise, we need panel-specific constant term or panel-specific dummy variable in order to generate the panel-specific demeand value. That is why FE estimate can be produced 8

9 in the dummy variable regression (DVR), one that includes all except one state dummy variable:. qui tab state, gen(sd). reg y beertax sd1-sd47 beertax sd sd _cons We drop one state dummy to avoid dummy variable trap. Notice that ˆβ 1 FE, a fact that FW theorem can prove. ˆβ DVR 1 = = 18. Intuitively, the DVR resolves the omitted-variable issue by using panel-specific dummy variables as proxy for w i. This is a good idea since both w i and the dummy variable are time constant. 19. Now we can test the null hypothesis that the state fixed effect (state dummies) is insignificant (H 0 : β 2 = 0 or equivalently H 0 : w i = 0).. testparm sd1-sd47 ( 1) sd1 = 0 (47) sd47 = 0 F( 47, 287) = Prob > F = So the null hypothesis is rejected. This F test is reported by xtreg command. 20. We can also test the significance of beertax 9

10 . test beertax ( 1) beertax = 0 F( 1, 287) = Prob > F = This F test is also reported by xtreg command. 21. Now we can understand why the pooled OLS is biased. Basically, it is because the pooled OLS omits state dummies, which are correlated with beertax:. qui reg beertax sd1-sd47. dis "F test for exogeneity of beertax is " e(f) F test for exogeneity of beertax is The first and third F tests, and , jointly explain why the pooled OLS is biased. 23. In a similar fashion, we can obtain the two-way fixed effect estimator by including in the DVR time dummy variables, which serve as proxy for panel-invariant unobserved effect (like the safety feature of cars).. qui tab year, gen(yd). reg y beertax sd1-sd47 yd1-yd6, nohe beertax sd yd yd _cons Or, we can use the xtreg command along with time dummy variables 10

11 . xtreg y beertax yd1-yd6, fe Fixed-effects (within) regression Number of obs = 336 Group variable: id Number of groups = 48 beertax yd _cons Finally, it is important to use cluster-robust standard error in order to account for the serial correlation among the error terms for given panel, i.e., between u i,t ū i and u i,t k ū i :. xtreg y beertax yd1-yd6, fe vce(cluster id) Fixed-effects (within) regression Number of obs = 336 Group variable: id Number of groups = 48 (Std. Err. adjusted for 48 clusters in id) Robust beertax yd yd _cons sca hatbeta1 = _b[beertax] The cluster-robust standard error can account for both across-panel heteroskedasticity and within-panel correlation. 11

12 26. We can test the year fixed effect:. testparm yd* ( 1) yd1 = 0 ( 2) yd2 = 0 ( 3) yd3 = 0 ( 4) yd4 = 0 ( 5) yd5 = 0 ( 6) yd6 = 0 F( 6, 47) = 4.22 Prob > F = In this case, the year fixed effect (yearly dummies) is significant at 1% level. So omitted variable bias would arise if we forget to include yearly dummies. 27. The two-way FE estimate ˆβ two way FE 1 = is economically significant since. qui sum y. dis "percentage change in y is " hatbeta1/r(mean) percentage change in y is traffic death drops by about 31% after tax rate changes by one unit. 28. Remember the command xtreg y x timedummy, fe vce(cluster id) reports the two-way fixed effect estimator with cluster-robust standard error. It is the most commonly-used command for applied economists. Equivalently, you can get the same ˆβ 1 by using reg y x paneldummy timedummy The FE estimator cannot be used if the key regressor is time-invariant; it is biased if the unobserved heterogeneity is time-varying. 29. (Optional) The command to do within transformation, and obtain within, between and random effect estimators are 12

13 xtreg y beertax, fe * within transformation bysort id: egen ybar = mean(y) bysort id: gen dmy = y - ybar bysort id: egen xbar = mean(beertax) bysort id: gen dmx = beertax - xbar * within regression reg dmy dmx * panel-specific intercept term (u_i in stata and a_i in textbook) gen u = ybar - _b[dmx]*xbar * standard deviation of u_i sum u if year==1982 * Between regression 14.2 in the textbook sort id year reg ybar xbar if year==1982 * Random effect model xtreg y beertax, re 13

Lecture 9: Panel Data Model (Chapter 14, Wooldridge Textbook)

Lecture 9: Panel Data Model (Chapter 14, Wooldridge Textbook) 1 2 Panel Data Panel data is obtained by observing the same person, firm, county, etc over several periods. Unlike the pooled cross sections,