Time-Series Cross-Section Analysis Models for Long Panels Jamie Monogan University of Georgia February 17, 2016 Jamie Monogan (UGA) Time-Series Cross-Section Analysis February 17, 2016 1 / 20
Objectives By the end of this meeting, participants should be able to: Estimate fixed effects and random effects models. Estimate a linear model with a lagged dependent variable using OLS. Correct for contemporaneous correlation in standard errors of estimates. Jamie Monogan (UGA) Time-Series Cross-Section Analysis February 17, 2016 2 / 20
General Considerations Customary Problems of Pooled Data And the Problems They Can Produce 1 Functional form: Dynamic or static? Linear or nonlinear? β is biased and inconsistent. 2 Unit effects, i.e., each unit requires an offset, C i, so that E(y) is the same across units. β is biased and inconsistent. Key question: Do your unit effects correlate with predictors? 3 Autocorrelation in the time dimension of residuals. β is inefficient (with biased standard errors). 4 Heteroscedasticity across units. β is inefficient (with biased standard errors). 5 Contemporaneous correlation: errors at time t are correlated across units. β is inefficient (with biased standard errors). 6 Spatial autocorrelation: geographically proximate units have correlated errors. β is inefficient (with biased standard errors). Jamie Monogan (UGA) Time-Series Cross-Section Analysis February 17, 2016 3 / 20
General Considerations Two Guiding Principles 1 Cookie-cutter approaches can be a recipe for disaster. 2 Error assumption problems are proportional to the size of errors. A well fit model can tolerate error assumption violations and produce good estimates. (Observational wisdom: second digit of s.e.) A specification where most of the variance remains in the error term will be highly sensitive to error assumptions. The Implication for Procedure Spend your time getting the theory and model specification right, not in worrying about the error term. Do what you can (e.g., dummies, where needed) to move variance into the structural part of the model. And only after you have done your best on structure should you begin to think about modeling error. Jamie Monogan (UGA) Time-Series Cross-Section Analysis February 17, 2016 4 / 20
General Considerations A Point of View on Theory Instead of the dichotomy between theory and empirical knowledge of error, assume three components: 1 Genuine a priori theory (not tainted by encounter with data). 2 Knowledge about context and data that you bring to your research, call it quasi-theory (i.e., unit effects). 3 Error: that which is totally inexplicable. Advice My procedural advice amounts to saying maximize the variance accounted for by 1 and 2 before concerning yourself with 3. It is not smart to fool yourself or your reader that (2) is actually an explanation. But it is a lot less smart to leave that variance in the error term, where it will surely cause mischief. Jamie Monogan (UGA) Time-Series Cross-Section Analysis February 17, 2016 5 / 20
General Considerations Informing Software of the Panel Structure R For many commands (lm, lme), R does not get bent out of shape. For others (plm), there are options to specify in the command, e.g.: index=c( state, year ) Stata Stata has very powerful pooled data capabilities, none of which work, unless your file is xtset for panel data. The syntax is: xtset unitvar timevar (and order matters) E.g., xtset state year Once you have xtset it, Stata respects the stacked structure of your data (e.g. it won t resort) and opens up all the xt commands, e.g., xtreg, xtgls, xtpcse,xtregar Do help xt Jamie Monogan (UGA) Time-Series Cross-Section Analysis February 17, 2016 6 / 20
Unit Effects Unit Effects Logic: everything that happened to the units before the period of our observation history is likely to have altered mean levels. Thus, the true model is often: y it = β 0 + k β j x jit + C i + u it j=1 where C i is the effect of being in unit i at all times. But with a naïve OLS model we fit: y it = β 0 + which clearly is underspecified. k β j x jit + u it j=1 Jamie Monogan (UGA) Time-Series Cross-Section Analysis February 17, 2016 7 / 20
Unit Effects The Consequence of Ignoring Unit Effects OLS estimates will be biased and inconsistent if Cov(C i, x jit ) 0 for at least one j. Why? If C i is not estimated, it goes into the error. Then unit variables which are accidentally correlated with it will proxy the effect. This will produce false inferences that unit variables explain between unit differences that are actually due to history. OLS estimates will be inefficient with biased standard errors if Cov(C i, x jit ) = 0, j. Why? If C i is not estimated, it goes into the error. Since C i is present in the error term for all of individual i s observations, all of those observations will correlate. Some Solutions 1 Fixed effects (i.e., unit dummy variables). 2 Random effects (GLS). 3 Lagged dependent variable. 4 Recentering approaches. Jamie Monogan (UGA) Time-Series Cross-Section Analysis February 17, 2016 8 / 20
Unit Effects Fixed Effects Model y it = k βx jit + γ i + u it j=1 We directly estimate each γ i. Two principal approaches: Least squares dummy variables (LSDV). The within estimator. R command: fe.mod < plm(y x, data=dataset, index=c( unitvar, timevar ), model= within ) Stata command: xtreg depvar varlist,fe Properties The fixed effects estimator solves the problem and produces unbiased ˆβ. But two issues: 1 If the number of units is large, the estimation of all those parameters is inefficient. 2 Unit dummies (or unit means) are perfectly collinear with time-invariant variables (and highly collinear with quasi-unit constants), making it impossible to estimate a model with both fixed effects and time-invariant variables (AKA unit constants). Jamie Monogan (UGA) Time-Series Cross-Section Analysis February 17, 2016 9 / 20
Unit Effects Error Components, aka GLSE, aka Random Effects You know this as a random intercept model, or a compound symmetry model. Reasoning: For pooled data we can apportion error variance sources as: σ 2 = σ 2 units(+σ 2 time) + σ 2 within That is, some proportion of the total error is due to differences between units, some proportion to differences between times, and some is pure error, due to neither units or time. Or, dropping time: σ 2 = σ 2 units + σ 2 within Jamie Monogan (UGA) Time-Series Cross-Section Analysis February 17, 2016 10 / 20
Unit Effects The Logic of Random Effects In the fixed effects approach we estimate parameters for fixed effects between units and thereby remove that variance from the error term. In random effects (GLSE, error components, etc.) we leave the between unit variance in the error term and change our assumptions to reflect the knowledge that between unit variance is in the error term. That is, we specify a (particular) GLS model which assumes between unit variance in the errors. The empirical key to this specification is estimating ρ, conceptually the proportion of all error that is due to unit effects. Jamie Monogan (UGA) Time-Series Cross-Section Analysis February 17, 2016 11 / 20
Unit Effects Estimating ρ ˆρ = ρ = σ2 Between σ 2 Total (OLS Residual Variance) - (LSDV Residual Variance) (OLS Residual Variance) Many ways to estimate ρ in the r.e. model: Maddala & Mount (1973): Monte Carlo analyses with 11 different estimators. Baltagi (1981, 2005): Considers additional options. You have already used FIML and REML. Other options just use post hoc residual analysis. With ˆρ, you can use formulas you know to get: β FGLS = [X Ω 1 X] 1 X Ω 1 y. Analysis by Clark & Linzer suggests that the choice of estimator does matter. (E.g., plm [R] yields different results from lmer [R] and BUGS.) Jamie Monogan (UGA) Time-Series Cross-Section Analysis February 17, 2016 12 / 20
Unit Effects Estimation of GLSE (Besides What You Already Know) R re.mod < plm(y x, data=dataset, index=c( unitvar, timevar ),model= random ) Stata xtreg depvar varlist,re re (random effects) is the default and does not need to be specified, but it is useful to remind yourself what model you are estimating. Jamie Monogan (UGA) Time-Series Cross-Section Analysis February 17, 2016 13 / 20
Unit Effects Properties of Random Effects Estimator Does not deal with a diminishing correlated error structure or with heteroscedasticity. But these error assumption violations produce unbiased but inefficient β and are generally much less serious in their consequences than are unit effects. Alternative: GLS-ARMA (Kmenta) Produces BLUE estimates if the problem of unit effects is the only assumption violation and the unit effects do not correlate with the covariates. Running a random effects model? A Hausman test is a common tool for evaluating model specification. In R: phtest from the plm library. Jamie Monogan (UGA) Time-Series Cross-Section Analysis February 17, 2016 14 / 20
Dynamic Models Perspective on Dynamics Beck & Katz Perspective We have a choice between: Static formulations, which then have expected unit effects and autocorrelation in the time dimension. Dynamic specification, where y t 1 on the right hand side will usually control history (hence unit effects) and eliminate autocorrelated error. Static requires some feasible GLS approach to deal with error assumptions and, because it is static, may be an inferior model of (inherently dynamic) causality. Dynamic is biased and inconsistent, but is probably the right structure and can then be estimated with OLS. Besides, a wrong functional form also induces bias. Conclusion: Dynamic specification with OLS is better. Note: we do have options on how we dynamically specify a model. One problem remains: contemporaneous correlation. This was a possibility with all the other models too, but was so far down the list of problems that most analysts chose not to deal with it. Jamie Monogan (UGA) Time-Series Cross-Section Analysis February 17, 2016 15 / 20
Dynamic Models Panel Corrected Standard Errors Contemporaneous Correlation Example Contemporaneous (error) correlation arises from some time-related process which affects most units at the same time similarly And is not part of the model structure. In a state analysis where some economic indicator (e.g., budget shares) is dependent, the national economy is perturbing most states to have similar errors at particular times. You could model this effect (if you have theory) You could dummy time points (which might be sensible if you had many states and few time points). But if you did neither, you would have contemporaneous correlation of errors across units. Jamie Monogan (UGA) Time-Series Cross-Section Analysis February 17, 2016 16 / 20
Dynamic Models Panel Corrected Standard Errors The Impact β estimates will be inefficient. Standard error estimates will be biased downward. Therefore, t tests are overstated and p estimates smaller than they should be. Monte Carlo analysis suggests that these effects are quite small, that even high levels of contemporaneous correlation cause only modest distortions. Jamie Monogan (UGA) Time-Series Cross-Section Analysis February 17, 2016 17 / 20
Dynamic Models Panel Corrected Standard Errors Panel Corrected Standard Errors Assume, a N N matrix of cross unit contemporary correlations: T e it e jt t=1 ij = T I T is a T T identity for the time dimension, then Ω = I T. Where Ω is NT NT and standard errors are taken from the diagonals of: (X X) 1 X ΩX(X X) 1 This is the panel correction. Jamie Monogan (UGA) Time-Series Cross-Section Analysis February 17, 2016 18 / 20
Dynamic Models Panel Corrected Standard Errors Software R Estimate a linear model Input the model name into the pcse function from the pcse library. Stata xtpcse y varlist Produces OLS β s And panel corrected standard errors. Jamie Monogan (UGA) Time-Series Cross-Section Analysis February 17, 2016 19 / 20
Dynamic Models Panel Corrected Standard Errors For Next Time Study for the midterm exam. Read FLW chapters 11-13. Download Owsiak s (2013) dataset of the impact of settled borders on democracy. The file owsiakjop2013.dta is in Stata format at http://dx.doi.org/10.7910/dvn/arkoti. Your cross-sectional units are countries (ccode) and temporal units are years (year). Remove every country-year with any missing observation. This is easy, but I will not tell you how. Do not ask. Estimate a linear regression with OLS using the following specification (with variable names in parentheses): The dependent variable is Polity score (polity2), and the predictors are an indicator for having all borders settled (allsettle), lagged GDP (laggdpam), lagged change in GDP (laggdpchg), lagged trade openness (lagtradeopen), lagged military personnel (lagmilper), lagged urban population (lagupop), and lagged pervious non-democratic movement (lagsumdown). Test for unit effects in the residuals with ANOVA. What is your conclusion? Estimate a fixed effects model. Test for serial correlation with a panel Breusch-Godfrey test. What is your conclusion? Estimate a lagged dependent variable model with OLS. Taken at face value, what do you make of the coefficient on lagged Polity score lagpolity? How do your three results contrast? Which model do you most prefer? Jamie Monogan (UGA) Time-Series Cross-Section Analysis February 17, 2016 20 / 20