Lecture 4: Linear panel models

Lecture 4: Linear panel models Luc Behaghel PSE February 2009 Luc Behaghel (PSE) Lecture 4 February 2009 1 / 47

Introduction Panel = repeated observations of the same individuals (e.g., rms, workers, countries) over several time periods (e.g., years, weeks). Two dimensions (cross-section and time-series), two indices: i for individuals, and t for time periods. Examples: Household panels (German Socio-Economic Panel, Panel Survey of Income Dynamics, Enquête Emploi,...); Panel of rms or workers based on administrative data; Panel of countries in the growth regressions literature Luc Behaghel (PSE) Lecture 4 February 2009 2 / 47

1 Balanced vs. unbalanced panel. If unbalanced, issue = sample selection. Here, balanced. Luc Behaghel (PSE) Lecture 4 February 2009 3 / 47

1 Balanced vs. unbalanced panel. If unbalanced, issue = sample selection. Here, balanced. 2 Short T and large N. ) asymptotic properties similar to cross-section case. Luc Behaghel (PSE) Lecture 4 February 2009 3 / 47

1 Balanced vs. unbalanced panel. If unbalanced, issue = sample selection. Here, balanced. 2 Short T and large N. ) asymptotic properties similar to cross-section case. 3 Panel 6= repeated cross-sections. Luc Behaghel (PSE) Lecture 4 February 2009 3 / 47

Attraction of panel data 1 More observations, more precision 2 New (more credible?) ways to identify causal e ects. e.g. di erence-in-di erences 3 Clear comparative advantage to understand the dynamics of individual behavior and economic e ects Approach followed in this lecture Main themes only 1 understand the two dimensions of panel data (longitudinal and cross-section) Luc Behaghel (PSE) Lecture 4 February 2009 4 / 47

Outline 1 Identi cation 2 Inference 3 Di erence-in-di erence methods 4 Extensions Luc Behaghel (PSE) Lecture 4 February 2009 5 / 47

Identi cation Example: impact of wages on the number of hours of labor supplied Panel data on 532 males for each of the 10 years from 1979 to 1988. Two variables: the log of hours worked (h), and the log of wages (w). The data set has 5320 pairs (w it, h it ) i=1,...,532;t=1,...,10. Luc Behaghel (PSE) Lecture 4 February 2009 6 / 47

Total dimension Higher wages are associated with longer hours Figure: All dimensions Luc Behaghel (PSE) Lecture 4 February 2009 7 / 47

Between dimension People who have higher wages on average during 10 years of their career tend to work longer hours Figure: Between dimension Luc Behaghel (PSE) Lecture 4 February 2009 8 / 47

Within dimension When a given worker has a higher wage than her usual (average) wage, she works longer hours Figure: Within dimension Luc Behaghel (PSE) Lecture 4 February 2009 9 / 47

Causal interpretation? ) is there a risk of omitted variable bias? 1 Between dimension > permanent unobserved heterogeneity: heterogeneity bias Luc Behaghel (PSE) Lecture 4 February 2009 10 / 47

Causal interpretation? ) is there a risk of omitted variable bias? 1 Between dimension > permanent unobserved heterogeneity: heterogeneity bias 2 Within dimension > shock simultaneously driving wages and hours: simultaneity bias Luc Behaghel (PSE) Lecture 4 February 2009 10 / 47

Decomposition of total variation: Notation: s the total standard deviation, s w the within standard deviation, and s b the between standard deviation, such that s 2 = s 2 w = s 2 b = 1 NT 1 NT 1 N 1 N N N T i=1 t=1 N T i=1 t=1 N i=1 (z i z) 2, (z it z) 2, (z it z i ) 2, and then Proof: show that s 2 = s 2 w + s 2 b. N T i=1 t=1 (z it z) 2 = N T i=1 t=1 (z it z i ) 2 + N T i=1 t=1 (z i z) 2. Luc Behaghel (PSE) Lecture 4 February 2009 11 / 47

Iden cation (continued) Formalisation: correlated and uncorrelated individual e ects Linear model: Speci c panel ingredients: y it = x it β + u it. 1 Time-varying and time-invariant variables:y it = β 0 + β 1 x it + β 2 z i + u it. 2 Error term with two components: permanent component c i (= individual e ect) and transitory component ε it : u it = c i + ε it. 3 Period e ects (trends, macro shocks, etc.) ) time dummies. Therefore, y it = β 0 + β 1 x it + β 2 z i + λ t + c i + ε it (1) = β 0 + β 1 x it + β 2 z i + T τ=2 λ τ 1 τt + c i + ε it Identi cation question: can we identify β 0, β 1, β 2 and the λ τ s? Luc Behaghel (PSE) Lecture 4 February 2009 12 / 47

Remark Why we treat the λ τ s di erently from the c i s? T small whereas N large. ) hope to estimate the (small number of) λ τ s consistently; not true for the c i s. Luc Behaghel (PSE) Lecture 4 February 2009 13 / 47

Assumptions on the individual e ect c i is not directly observable ) include it in the error term, just like the error term in a cross-sectional regression? Condition for consistency Cov(u it, x it ) = Cov(u it, z i ) = 0 or equivalently Cov(εit, x it ) = Cov(ε it, z i ) = 0 Cov(c i, x it ) = Cov(c i, z i ) = 0 Cov(c i, x it ) = Cov(c i, z i ) = 0 is the uncorrelated individual e ects assumption. Interpretation: what are the unobserved permanent determinants of y? Luc Behaghel (PSE) Lecture 4 February 2009 14 / 47

Identi cation in the presence of correlated individual e ects Good news of panel data: even without instruments, we can still hope to get consistent estimates of the coe cients of the time-varying variables (i.e. estimate β 1, but not β 2 ). Approach: transform the variables to get rid of c i. First approach: rst-di erence model often denoted y it y it 1 = β 1 (x it x it 1 ) + ε it ε it 1 y it = β 1 x it + ε it. (2) Second approach: within model y it y i = β 1 (x it x i ) + ε it ε i. (3) c i disappears. Therefore, if we consider ε it as our error term, and if we assume Cov( x it, ε it ) = 0, OLS provides an unbiased estimate of β 2. β 2 and z i disappear as well. This means that our di erencing strategy does not yield an estimate of β 2. Luc Behaghel (PSE) Lecture 4 February 2009 15 / 47

Not surprisingly, we have returned to the within and between dimensions previously described: 1 When estimating the between model, we need to ask whether c i is correlated with x i and/or z i. If yes, OLS lead to an omitted variable bias called the heterogeneity bias. 2 When estimating the within model, we need to ask whether ε it ε i is correlated with x it x i. If yes, there is an omitted variable bias called the simultaneity bias. Luc Behaghel (PSE) Lecture 4 February 2009 16 / 47

Remark 1 Assymmetry. Condition for consistency of within: Cov(x it x i, ε it ε i ) = 0. Condition for consistency of between: Cov(x i, c i + ε i ) = 0 ) absence of correlation for the permanent and for the transitory unobservable components of the error term: Cov(ci, x i ) = 0 Cov(ε i, x i ) = Cov 1 N ε it, 1 N x it = 0 If (ε it, x it ) t=1,...,t are i.i.d., with Cov(ε it, x it 0) = 0 for t 6= t 0, the second condition means Cov(ε it, x it ) = 0 (for all t). This means that ε it and x it must not be simultaneously driven by an unobserved shock; that is, the between model can also su er from a simultaneity bias. Luc Behaghel (PSE) Lecture 4 February 2009 17 / 47

Remark 2 The rst-di erence and the within models have similar conditions for consistency. They are in general very close. In the case where T = 2, they are even identical. Check it by noting that, for any variable w, w it w i = 1 2 w it. Luc Behaghel (PSE) Lecture 4 February 2009 18 / 47

Remark 3 Cases where the structure of the model itself allows you to reject the assumption that Cov( x it, ε it ) = 0 (or that Cov(x it x i, ε it ε i ) = 0). For instance, if x is the lagged dependent variable: x it = y it 1. Then we have y it = β 1 y it and the question is whether Cov( y it 1 + ε it 1, ε it ) = 0. Now, Cov( y it 1, ε it ) = Cov(β 1 y it 2 + ε it 1, ε it ) = β 1 Cov( y it 2, ε it ) + Cov( ε it 1, ε it ). Now, Cov( ε it 1, ε it ) is unlikely to be 0. Even in the case where there is no serial correlation (i.e. Cov(ε it, ε it 0) = 0 for t 6= t 0, Cov( ε it 1, ε it ) = Var(ε it 1 ) 6= 0. Therefore, when the individual e ects are correlated and there is a lagged dependent variable as a regressor, we are in trouble: rst-di erencing the data will not be su cient to get a consistent estimator. Other methods will be required; they will actually use instrumental variables: we postpone this to section 5. Luc Behaghel (PSE) Lecture 4 February 2009 19 / 47

Estimation Questions to consider: 1 Can we assume that the individual e ects are uncorrelated? This splits the available estimators in two groups. 2 In a given group of estimators, the next question is e ciency: are there gains to be made from taking into account the speci c error structure of panel data? 3 For a given estimator, how should we derive consistent standard errors (again, taking into account the fact that the error structure is more complex than in the cross-section case)? Notations: y it = x it β + c i + ε it. Luc Behaghel (PSE) Lecture 4 February 2009 20 / 47

Estimation Estimators when individual e ects are uncorrelated Pooled OLS The key assumption here is that for each period t (and for each individual i), regressors and errors are uncorrelated: Cov(c i, x it ) = Cov(ε it, x it ) = Cov(u it, ε it ) = 0. (4) If the model has an intercept, we also have E (u it ) = 0. Pooling all the NT (j = 1,..., NT ) observations: yj = x j β + u j E (xj 0u j ) = E (u j ) = 0 ) Same as the standard cross section model ) estimate by OLS: pooled OLS estimator = bβ POLS = OLS estimator obtaining by pooling all the observations and regressing y on x. Luc Behaghel (PSE) Lecture 4 February 2009 21 / 47

Rem. 1: Comparison to cross-section estimators On period 1 only: i = 1,..., N and yi1 = x i1 β + u i1 E (x 0 i1 u i1) = E (u i1 ) = 0 ) consistent estimator by OLS, bβ 1OLS. Similarly for other T 1 periods. Gain of pooling: precision. Luc Behaghel (PSE) Lecture 4 February 2009 22 / 47

Rem. 2: Need for panel-robust standard errors Cross-sections: the (asymptotic) standard error of an estimate is inversely proportional to the square root of the sample size. Panel data: things are made a bit more complex by the fact that the error term has two parts. ) intuitively, adding N new individuals to a cross-section with N observations adds more information than adding a second period with the same individuals. Part of the information that was in rst period is repeated by the information in the second period. We need to account for that. This is what panel-robust standard errors do. Luc Behaghel (PSE) Lecture 4 February 2009 23 / 47

Parenthesis: default and robust standard errors Cross-section case: default standard errors computed under the assumption that all observations are independent, i.e. error terms i.i.d. Var(ui jx i ) = σ 2 for all i Cov(u i, u i 0) = 0 for all i 6= i 0 Sometimes, the assumption does not hold. Example: linear probability model (cf. lecture 1). ) robust standard errors: s.e. that are valid even when the error terms are not i.i.d. (often larger than default s.e.) Luc Behaghel (PSE) Lecture 4 February 2009 24 / 47

Error structure in panel data The i.i.d. assumption would be 8 < : Var(u it jx it ) = σ 2 for all i and t Cov(u it, u it 0) = 0 for all i and t 6= t 0 Cov(u i 0 t, u it ) = 0 for all t and i 6= i 0 Cov(u it, u it 0) = 0 for all i and t 6= t 0 is unlikely. At the minimum, we have Cov(u it, u it 0) = Cov(c i + ε it, c i + ε it 0) = Var(c i ) 6= 0. ) always a need for robust s.e. with POLS ) the robustness is wrt correlations across the di erent observations of the same individual, i.e. wrt correlations within the clusters constituted by the di erent individuals. In Stata, the command reads where id is the individual identifyer. regress y x, robust cluster(id) Luc Behaghel (PSE) Lecture 4 February 2009 25 / 47

Hours and wages example Luc Behaghel (PSE) Lecture 4 February 2009 26 / 47

Between estimator Condition for consistency Cov(c i + ε i, x i ) = 0. Su cient condition Cov(c i, x it ) = Cov(ε it 0, x it ) = 0 for all i, t and t 0. (5) ) de ne the between estimator, bβ B, as the OLS estimator from regressing y i on x i. One observation per individual: correlation of error terms is less of an issue. Robust standard errors may correct other problems (heteroskedasticity). Luc Behaghel (PSE) Lecture 4 February 2009 27 / 47

Random e ect (RE) estimator Parenthesis: Generalized least squares If the error terms are not i.i.d., GLS are an e cient alternative to OLS Idea: weigh the observations according to the information they provide. The additional information (and the noise) provided by an observation depends on the variance-covariance matrix of the error term, which is estimated in a rst step. Luc Behaghel (PSE) Lecture 4 February 2009 28 / 47

RE e ect model = one application of GLS to panel data Speci c model for the error term: 8 < : Var(u it jx it ) = Var(c i ) + Var(ε it ) = σ 2 c + σ 2 ε for all i and t Cov(u it, u it 0) = σ 2 c for all i and t 6= t 0 Cov(u i 0 t, u it ) = 0 for all t, t 0 and i 6= i 0 Given this structure of disturbances, it can be shown that the e cient GLS estimator can be calculated from the OLS regression of y it λy i on x it λx i, with σ ε λ = 1 p. σ 2 ε + T σ 2 c Luc Behaghel (PSE) Lecture 4 February 2009 29 / 47

Comparison to POLS and WITHIN If σ 2 c goes to 0, λ goes to 0. We come back to the POLS estimator. Intuition: when σ 2 c is close to 0, all the observation have the same amount of noise, and they are not correlated between each other. It is therefore optimal to weigh them equally. If T goes to in nity, then λ = 1, and we have the within estimator. No loss of e ciency in discarding the between information. Intuition: in a very long panel, the between information becomes negligible. In practice of course, λ is strictly between 0 and 1. The RE estimator can be seen as an intermediary estimator between the POLS and the within estimators. It can also be shown that it is a weighted average of the between and the within estimators. Luc Behaghel (PSE) Lecture 4 February 2009 30 / 47

Remark 1 Implementing the RE estimator requires to estimate λ in a rst step. This is done by rst estimating the residuals from a POLS regression, and then looking at the empirical correlations between residuals. Stata s xtreg command with the re option does the job. Remark 2 The RE speci cation does better than POLS if the model of the variance-covariance matrix of errors is correct. If not, it might do better or worse, although it will still be consistent. Moreover, the standard errors then need to corrected: as in POLS, this is done by using panel-robust standard errors. Luc Behaghel (PSE) Lecture 4 February 2009 31 / 47

Estimation Estimators when individual e ects are correlated We cannot include c i in the disturbance anymore: omitted variable bias. Within of Fixed E ect (FE) estimator The within estimator (bβ W ) is de ned as the OLS estimator obtained by regressing y it y i on x it x i. The key assumption for consistency is for which a su cient condition is Cov(ε it ε i, x it x i ) = 0 Cov(ε it, x it 0) = 0 for each i, t and t 0. Within = Fixed e ects β W is equal to the xed e ect estimator obtained from the OLS regression of y it on x it and N individual dummies (one for each individual). In other word, controlling for individual unobserved characteristics by estimating c i as the coe cient on a dummy for individual i amounts to doing a within estimation. Luc Behaghel (PSE) Lecture 4 February 2009 32 / 47

Remark 1 c i s not consistently estimated. Still interesting for descriptive purposes. Remark 2 Panel-robust standard errors are also needed: the within transformation implies that there is serial correlation. Luc Behaghel (PSE) Lecture 4 February 2009 33 / 47

First-di erence (FD) estimator The rst-di erence estimator (bβ FD ) is de ned as the OLS estimator obtained by regressing y it on x it. The key assumption for consistency is for which a su cient condition is Cov( ε it, x it ) = 0 Cov(ε it, x it 1 ) = Cov(ε it, x it+1 ) = Cov(ε it, x it ) = 0 for each i and t. In the same way as for the FE estimator, panel-robust standard errors are needed. Luc Behaghel (PSE) Lecture 4 February 2009 34 / 47

Estimation Choosing an estimator Hours and wages example Luc Behaghel (PSE) Lecture 4 February 2009 35 / 47

Testing the uncorrelated e ects hypothesis If H 0 : the individual e ects are uncorrelated, then bβ FE and bβ RE converge to the same limit, β. ) Hausman test: compare two estimators that should be consistent for the same parameter (under H 0 ). Hausman statistics: depends on the distance between the two estimates as well as on the variance of this di erence: 0 1 H = b β FE bβ b RE V b β FE bβ b RE β FE bβ RE Reject if H above a critical value. Example: H = 1.65 < 3.84: we cannot statistically reject H 0. However, this might be due to a lack of power: bβ FE and bβ RE are not precisely estimated. If we had a larger sample, maybe we would have rejected H 0... Luc Behaghel (PSE) Lecture 4 February 2009 36 / 47

Conclusion: 1 Key decision: uncorrelated individual e ects or not. Statistical test not necessarily convincing to choose. ) more conservative models (FE and FD) often preferred. However, the FE and FD come at a cost: 1 as they discard the between information, they are less precise; 2 they do not enable us to estimate the e ects of time-invariant variables. Luc Behaghel (PSE) Lecture 4 February 2009 37 / 47

Di erences in di erences Di -in-di s very often used with panel data, in particular in the evaluation of public policies. Combination of before / after and treatment / control comparisons: does the evolution of the outcome in the treated group di er from the evolution in the control group? Identifying assumption: di erences in these evolutions are due to the policy (and to some random noise). Numerous examples using natural experiments : Impact of immigration on wages and employment of local workers: Mariel Boatlift (Card, 1990) Luc Behaghel (PSE) Lecture 4 February 2009 38 / 47

Di erences in di erences Example 1994: APE extended to parents with 2 kids Situation unchanged for mothers with 1 or 3 kids ) ideal natural experiment Luc Behaghel (PSE) Lecture 4 February 2009 39 / 47

Hope: parallel trends before change in the treatment group after the policy change Luc Behaghel (PSE) Lecture 4 February 2009 40 / 47

Luc Behaghel (PSE) Lecture 4 February 2009 41 / 47

Controls (mothers with 1 kid) Treatment (mothers with two kids) Before (1994) After (1997) Evolution (1st difference) 62% 64,5% +2,5% 58,6% 47,4% 11,2% Relative evolution (2 nd difference) Treatment Controls = 13,7% Figure: Employment rates Luc Behaghel (PSE) Lecture 4 February 2009 42 / 47

Formalization as a panel data estimator Controls (mothers with 1 kid) Treatment (mothers two kids) with Before (1994) a a + b2kids After (1997) a + a + c post Evolution (1st difference) c post + b 2 kids c post + d APE c post + d APE Relative evolution (2 nd difference) Treatment Controls = d APE ) can be viewed as a linear probability model over two periods: a + b2kids [kids emp it = it = 2] + c post 1[year it > 1994] +d APE 1[kids it = 2, year it > 1994] + u it Luc Behaghel (PSE) Lecture 4 February 2009 43 / 47

Extension to a panel model over more than two periods with several control groups including covariates accounting for binary outcome a + b2kids [kids Pr(emp it j...) = Φ it = 2] + b 3+kids [kids it > 2] + λ t +d APE 1[kids it = 2, year it > 1994] + x it β ) Slightly more complex than the di erence in di erences. But same idea. Luc Behaghel (PSE) Lecture 4 February 2009 44 / 47

Di erences in di erences Prototypical di -in-di model A subset of States that passed a law: s 2 S Treat t = 1,..., T periods of observation; t 0 s = date of passing the law (if s 2 S Treat ) ) di -in-di models: 1 with outcomes and covariates measured at the State level: y st = α s + λ t + β1[t t 0 s, s 2 S Treat ] + x st γ + u st 2 with outcomes and covariates measured at the individual level i (repeated cross-sections of individuals, but panel of States): y ist = α s + λ t + β1[t t 0 s, s 2 S Treat ] + x ist γ + u ist Luc Behaghel (PSE) Lecture 4 February 2009 45 / 47

Important remark: standard errors y ist = α s + λ t + β1[t t 0 s, s 2 S Treat ] + x ist γ + u ist Two reasons for dependence between observations: 1 Common shocks for individuals in the same State in the same year (clustered sample); 2 Serial correlations between shocks in a given State over time. u ist = η st + ε ist with η st serially correlated ) within a given State, there are correlations within periods and between periods ) need to have standard errors robust to these correlations:, robust cluster(id_state) in Stata s parlance ) in practice, many empirical papers have missed this point and underestimated their standard errors (Bertrand, Du o and Mullainathan, QJE 2004) Luc Behaghel (PSE) Lecture 4 February 2009 46 / 47

Di erences in di erences Testing the key identifying assumptions y st = α s + λ t + β1[t t 0 s, s 2 S Treat ] + x st γ + u st Identifying assumption: Cov(u st, 1[t t 0 s, s 2 S Treat ]) after controlling for covariates () State shocks are not correlated with the change in policy Intuitive phrasing: Any systematic di erence in the evolution of the two groups can be attributed to the policy This cannot be directly tested. But indirect tests: 1 No systematic di erence in the evolutions of the two groups before the policy occurred ) APE: parallel trends before 1994 Luc Behaghel (PSE) Lecture 4 February 2009 47 / 47